Welcome to the instructory notebook of the Un4CHANate project.

In the first section of this notebook will chunk and clean dataset from the InternetArchive 4Plebs (https://archive.org/details/4plebs-org-data-dump-2024-01). Through the chunking process, the files are divided into smaller, more manageable pieces, making them small enough to be opened and processed on a standard computer without requiring high-performance hardware.

If you are interested in testing our second notebook (for data analysis), a demo csv file is added onto this Github repository.

This notebook extracts the csv file from 4plebs to a smaller csv file only containing the timestamp, comment in the timeframe of interest for your own analysis.

**STEP I:**

Download your chosen dataset at https://archive.org/details/4plebs-org-data-dump-2024-01

*Categories:*
- /b/: Random (the infamous anything-goes board).
- /v/: Video games.
- /pol/: Politically incorrect.
- /a/: Anime & manga.

**STEP II:** 

Write down the the start- and end-date of the timeframe of interest for your analysis (Y-M-D).

In [11]:
from datetime import datetime

# Fill your timeframe of interest in Year-Month-Day format, this code cell will give you the unix-values needed to find the correct timeframe in the larger CSV file.

def convert_to_unix(date_string, date_format="%Y-%m-%d"):

    try:
        # Parse the date string into a datetime object
        date_obj = datetime.strptime(date_string, date_format)
        # Convert the datetime object to a Unix timestamp
        unix_timestamp = int(date_obj.timestamp())
        return unix_timestamp
    except ValueError as e:
        return f"Error: {e}"

# Prompt the user for start and end dates
start_date = input("Enter the start date (format: YYYY-MM-DD): ")
end_date = input("Enter the end date (format: YYYY-MM-DD): ")

# Convert the provided dates to Unix timestamps
timestamp_start = int(convert_to_unix(start_date))
timestamp_end = int(convert_to_unix(end_date))

# Print the results
print(f"Date of start = {start_date}")
print(f"timestamp_start = {timestamp_start}")
print(f"Date of end = {end_date}")
print(f"timestamp_end = {timestamp_end}")



Date of start = 2013-12-01
timestamp_start = 1385852400
Date of end = 2013-12-02
timestamp_end = 1385938800


**STEP III:** 

Write down the correct path which redirects to the CSV containing the dataset downloaded on 4plebs. Create a folder and fill in the path where the chuncked files will show up.

Please note that by changing the columns of interest the notebooks will not work adequately.

This process will take some time, you can wait for all the chuncks to be processed.
Another option would be to open the processed chunks and assess if the number in the 'time' column is corresponding with your timeframe. To find the the number tied to the timeframe read the UNIX number at STEP II.

In [None]:
import pandas as pd
import os
import re

# File paths
input_file = r'C:\Users'  # Direct to the csv file.
output_folder = r'C:\Users' # Direct to the folder in which you want the chuncked files

chunk_size = 10000000 # Number of rows per chunk
columns_to_extract = [4, 22]  # Columns containing the timestamp and comment string
pattern = r">>"  # Regex pattern to match strings containing '>>'

if not os.path.exists(output_folder):
    os.makedirs(output_folder)

# Process the file in chunks
chunks = pd.read_csv(
    input_file,
    engine='python',
    chunksize=chunk_size,
    on_bad_lines='skip',
    delimiter=',',
    quoting=3
)

for i, chunk in enumerate(chunks):
    try:
        # Extract specified columns and drop missing values
        selected_columns = chunk.iloc[:, columns_to_extract].copy()
        selected_columns.columns = ['time', 'comment']  # Rename the columns for clarity
        selected_columns = selected_columns.dropna()

        # Filter out rows where 'comment' contains '>>'
        cleaned_data = selected_columns[~selected_columns['comment'].str.contains(pattern, na=False)]

        # Strip whitespace or quotes from the 'time' column and convert to integers
        cleaned_data['time'] = cleaned_data['time'].astype(str).str.strip(' "')
        cleaned_data['time'] = pd.to_numeric(cleaned_data['time'], errors='coerce', downcast='integer')

        # Drop rows where conversion resulted in NaN
        cleaned_data = cleaned_data.dropna(subset=['time'])

        # Convert the 'time' column to integers after cleaning
        cleaned_data['time'] = cleaned_data['time'].astype(int)

        # Define the output file path for this chunk
        output_file = os.path.join(output_folder, f"chunk_{i + 1}.csv")

        # Save the cleaned chunk to a CSV file
        cleaned_data.to_csv(output_file, index=False, header=True)
        print(f"Cleaned chunk {i + 1} saved to {output_file}.")

    except Exception as e:
        print(f"Error processing chunk {i + 1}: {e}")

print("All chunks processed and saved successfully.")


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = cleaned_data['time'].astype(str).str.strip(' "')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = pd.to_numeric(cleaned_data['time'], errors='coerce', downcast='integer')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = cleaned_data['

Cleaned chunk 1 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_1.csv.
Cleaned chunk 2 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_2.csv.
Cleaned chunk 3 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_3.csv.


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = cleaned_data['time'].astype(str).str.strip(' "')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = pd.to_numeric(cleaned_data['time'], errors='coerce', downcast='integer')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = cleaned_data['

Cleaned chunk 4 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_4.csv.
Cleaned chunk 5 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_5.csv.


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = cleaned_data['time'].astype(str).str.strip(' "')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = pd.to_numeric(cleaned_data['time'], errors='coerce', downcast='integer')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = cleaned_data['

Cleaned chunk 6 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_6.csv.
Cleaned chunk 7 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_7.csv.


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = cleaned_data['time'].astype(str).str.strip(' "')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = pd.to_numeric(cleaned_data['time'], errors='coerce', downcast='integer')


Cleaned chunk 8 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_8.csv.


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = cleaned_data['time'].astype(str).str.strip(' "')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = pd.to_numeric(cleaned_data['time'], errors='coerce', downcast='integer')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = cleaned_data['

Cleaned chunk 9 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_9.csv.
Cleaned chunk 10 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_10.csv.
Cleaned chunk 11 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_11.csv.


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = cleaned_data['time'].astype(str).str.strip(' "')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = pd.to_numeric(cleaned_data['time'], errors='coerce', downcast='integer')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = cleaned_data['

Cleaned chunk 12 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_12.csv.
Cleaned chunk 13 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_13.csv.
Cleaned chunk 14 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_14.csv.


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = cleaned_data['time'].astype(str).str.strip(' "')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = pd.to_numeric(cleaned_data['time'], errors='coerce', downcast='integer')


Cleaned chunk 15 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_15.csv.


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = cleaned_data['time'].astype(str).str.strip(' "')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = pd.to_numeric(cleaned_data['time'], errors='coerce', downcast='integer')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = cleaned_data['

Cleaned chunk 16 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_16.csv.
Cleaned chunk 17 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_17.csv.


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = cleaned_data['time'].astype(str).str.strip(' "')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = pd.to_numeric(cleaned_data['time'], errors='coerce', downcast='integer')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = cleaned_data['

Cleaned chunk 18 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_18.csv.
Cleaned chunk 19 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_19.csv.
Cleaned chunk 20 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_20.csv.


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = cleaned_data['time'].astype(str).str.strip(' "')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = pd.to_numeric(cleaned_data['time'], errors='coerce', downcast='integer')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = cleaned_data['

Cleaned chunk 21 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_21.csv.
Cleaned chunk 22 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_22.csv.
Cleaned chunk 23 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_23.csv.


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = cleaned_data['time'].astype(str).str.strip(' "')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = pd.to_numeric(cleaned_data['time'], errors='coerce', downcast='integer')


Cleaned chunk 24 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_24.csv.


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = cleaned_data['time'].astype(str).str.strip(' "')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = pd.to_numeric(cleaned_data['time'], errors='coerce', downcast='integer')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = cleaned_data['

Cleaned chunk 25 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_25.csv.
Cleaned chunk 26 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_26.csv.


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = cleaned_data['time'].astype(str).str.strip(' "')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = pd.to_numeric(cleaned_data['time'], errors='coerce', downcast='integer')


Cleaned chunk 27 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_27.csv.


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = cleaned_data['time'].astype(str).str.strip(' "')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = pd.to_numeric(cleaned_data['time'], errors='coerce', downcast='integer')


Cleaned chunk 28 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_28.csv.


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = cleaned_data['time'].astype(str).str.strip(' "')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = pd.to_numeric(cleaned_data['time'], errors='coerce', downcast='integer')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = cleaned_data['

Cleaned chunk 29 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_29.csv.
Cleaned chunk 30 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_30.csv.


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = cleaned_data['time'].astype(str).str.strip(' "')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = pd.to_numeric(cleaned_data['time'], errors='coerce', downcast='integer')


Cleaned chunk 31 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_31.csv.


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = cleaned_data['time'].astype(str).str.strip(' "')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = pd.to_numeric(cleaned_data['time'], errors='coerce', downcast='integer')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = cleaned_data['

Cleaned chunk 32 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_32.csv.
Cleaned chunk 33 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_33.csv.


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = cleaned_data['time'].astype(str).str.strip(' "')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = pd.to_numeric(cleaned_data['time'], errors='coerce', downcast='integer')


Cleaned chunk 34 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_34.csv.


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = cleaned_data['time'].astype(str).str.strip(' "')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = pd.to_numeric(cleaned_data['time'], errors='coerce', downcast='integer')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = cleaned_data['

Cleaned chunk 35 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_35.csv.
Cleaned chunk 36 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_36.csv.


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = cleaned_data['time'].astype(str).str.strip(' "')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = pd.to_numeric(cleaned_data['time'], errors='coerce', downcast='integer')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = cleaned_data['

Cleaned chunk 37 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_37.csv.
Cleaned chunk 38 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_38.csv.
Cleaned chunk 39 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_39.csv.


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = cleaned_data['time'].astype(str).str.strip(' "')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = pd.to_numeric(cleaned_data['time'], errors='coerce', downcast='integer')


Cleaned chunk 40 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_40.csv.


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = cleaned_data['time'].astype(str).str.strip(' "')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = pd.to_numeric(cleaned_data['time'], errors='coerce', downcast='integer')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = cleaned_data['

Cleaned chunk 41 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_41.csv.
Cleaned chunk 42 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_42.csv.


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = cleaned_data['time'].astype(str).str.strip(' "')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = pd.to_numeric(cleaned_data['time'], errors='coerce', downcast='integer')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = cleaned_data['

Cleaned chunk 43 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_43.csv.
Cleaned chunk 44 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_44.csv.


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = cleaned_data['time'].astype(str).str.strip(' "')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = pd.to_numeric(cleaned_data['time'], errors='coerce', downcast='integer')


Cleaned chunk 45 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_45.csv.


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = cleaned_data['time'].astype(str).str.strip(' "')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = pd.to_numeric(cleaned_data['time'], errors='coerce', downcast='integer')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = cleaned_data['

Cleaned chunk 46 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_46.csv.
Cleaned chunk 47 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_47.csv.
Cleaned chunk 48 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_48.csv.


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = cleaned_data['time'].astype(str).str.strip(' "')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = pd.to_numeric(cleaned_data['time'], errors='coerce', downcast='integer')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = cleaned_data['

Cleaned chunk 49 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_49.csv.
Cleaned chunk 50 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_50.csv.
Cleaned chunk 51 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_51.csv.


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = cleaned_data['time'].astype(str).str.strip(' "')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = pd.to_numeric(cleaned_data['time'], errors='coerce', downcast='integer')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = cleaned_data['

Cleaned chunk 52 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_52.csv.
Cleaned chunk 53 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_53.csv.
Cleaned chunk 54 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_54.csv.


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = cleaned_data['time'].astype(str).str.strip(' "')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = pd.to_numeric(cleaned_data['time'], errors='coerce', downcast='integer')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = cleaned_data['

Cleaned chunk 55 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_55.csv.
Cleaned chunk 56 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_56.csv.
Cleaned chunk 57 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_57.csv.


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = cleaned_data['time'].astype(str).str.strip(' "')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = pd.to_numeric(cleaned_data['time'], errors='coerce', downcast='integer')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = cleaned_data['

Cleaned chunk 58 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_58.csv.
Cleaned chunk 59 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_59.csv.
Cleaned chunk 60 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_60.csv.


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = cleaned_data['time'].astype(str).str.strip(' "')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = pd.to_numeric(cleaned_data['time'], errors='coerce', downcast='integer')


Cleaned chunk 61 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_61.csv.


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = cleaned_data['time'].astype(str).str.strip(' "')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = pd.to_numeric(cleaned_data['time'], errors='coerce', downcast='integer')


Cleaned chunk 62 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_62.csv.


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = cleaned_data['time'].astype(str).str.strip(' "')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = pd.to_numeric(cleaned_data['time'], errors='coerce', downcast='integer')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = cleaned_data['

Cleaned chunk 63 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_63.csv.
Cleaned chunk 64 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_64.csv.
Cleaned chunk 65 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_65.csv.


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = cleaned_data['time'].astype(str).str.strip(' "')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = pd.to_numeric(cleaned_data['time'], errors='coerce', downcast='integer')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = cleaned_data['

Cleaned chunk 66 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_66.csv.
Cleaned chunk 67 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_67.csv.


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = cleaned_data['time'].astype(str).str.strip(' "')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = pd.to_numeric(cleaned_data['time'], errors='coerce', downcast='integer')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = cleaned_data['

Cleaned chunk 68 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_68.csv.
Cleaned chunk 69 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_69.csv.


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = cleaned_data['time'].astype(str).str.strip(' "')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = pd.to_numeric(cleaned_data['time'], errors='coerce', downcast='integer')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = cleaned_data['

Cleaned chunk 70 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_70.csv.
Cleaned chunk 71 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_71.csv.


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = cleaned_data['time'].astype(str).str.strip(' "')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = pd.to_numeric(cleaned_data['time'], errors='coerce', downcast='integer')


Cleaned chunk 72 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_72.csv.


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = cleaned_data['time'].astype(str).str.strip(' "')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = pd.to_numeric(cleaned_data['time'], errors='coerce', downcast='integer')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = cleaned_data['

Cleaned chunk 73 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_73.csv.
Cleaned chunk 74 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_74.csv.


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = cleaned_data['time'].astype(str).str.strip(' "')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = pd.to_numeric(cleaned_data['time'], errors='coerce', downcast='integer')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['time'] = cleaned_data['

Cleaned chunk 75 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_75.csv.
Cleaned chunk 76 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_76.csv.
Cleaned chunk 77 saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\chunk_77.csv.


KeyboardInterrupt: 

**STEP IV:**

Fill in the 'filtered_folder' with the folder where you want to save the filtered chunks.

Fill in the 'final_output_file' with the path and name for the final CSV.




In [14]:
import os
import pandas as pd

def filter_chunks_by_timestamp(input_folder, output_folder, timestamp_start, timestamp_end):

    if not os.path.exists(output_folder):
        os.makedirs(output_folder)

    for file in os.listdir(input_folder):
        if file.endswith('.csv'):
            file_path = os.path.join(input_folder, file)
            try:
                # Read the CSV file
                df = pd.read_csv(file_path)

                # Ensure the first column is cleaned and converted to integers
                df.iloc[:, 0] = df.iloc[:, 0].astype(str).str.strip(' "')
                df.iloc[:, 0] = pd.to_numeric(df.iloc[:, 0], errors='coerce', downcast='integer')

                # Drop rows where conversion resulted in NaN
                df = df.dropna(subset=[df.columns[0]])

                # Convert the column to integers after cleaning
                df.iloc[:, 0] = df.iloc[:, 0].astype(int)

                # Filter rows based on the timestamp range
                filtered_df = df[(df.iloc[:, 0] >= timestamp_start) & (df.iloc[:, 0] <= timestamp_end)]

                # If there are matching rows, save the filtered chunk to the output folder
                if not filtered_df.empty:
                    output_path = os.path.join(output_folder, file)
                    filtered_df.to_csv(output_path, index=False)
                    print(f"Filtered chunk saved: {output_path}")

            except Exception as e:
                print(f"Error processing {file}: {e}")

def merge_filtered_chunks(filtered_folder, output_file):
  
    dataframes = []

    for file in os.listdir(filtered_folder):
        if file.endswith('.csv'):
            file_path = os.path.join(filtered_folder, file)
            try:
                df = pd.read_csv(file_path)
                dataframes.append(df)
            except Exception as e:
                print(f"Error reading {file}: {e}")

    if dataframes:
        merged_df = pd.concat(dataframes, ignore_index=True)
        merged_df.to_csv(output_file, index=False)
        print(f"Merged CSV saved to {output_file}")
    else:
        print("No CSV files found to merge.")

# Fill in the missing values
input_folder = output_folder
filtered_folder = r"C:\Users"  # Replace with the folder where you want to save the filtered chunks
final_output_file = r"C:\Users.csv"  # Replace with the path and name for the final CSV

filter_chunks_by_timestamp(input_folder, filtered_folder, timestamp_start, timestamp_end)
merge_filtered_chunks(filtered_folder, final_output_file)


1      1385724397
2      1385705498
3      1385706643
4      1385710987
          ...    
182    1385715703
183    1385715924
184    1385716388
185    1385718174
186    1385724923
Name: time, Length: 187, dtype: object' has dtype incompatible with int64, please explicitly cast to a compatible dtype first.
  df.iloc[:, 0] = df.iloc[:, 0].astype(str).str.strip(' "')
1      1385762118
2      1385762157
3      1385762194
4      1385762205
          ...    
218    1385768078
219    1385768172
220    1385768198
221    1385768210
222    1385768219
Name: time, Length: 223, dtype: object' has dtype incompatible with int64, please explicitly cast to a compatible dtype first.
  df.iloc[:, 0] = df.iloc[:, 0].astype(str).str.strip(' "')
1      1385768301
2      1385768348
3      1385768381
4      1385768387
          ...    
228    1385774540
229    1385774566
230    1385774573
231    1385774586
232    1385774669
Name: time, Length: 233, dtype: object' has dtype incompatible with int64, please expl

Filtered chunk saved: C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\filter\chunk_22.csv
Filtered chunk saved: C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\filter\chunk_23.csv
Filtered chunk saved: C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\filter\chunk_24.csv
Filtered chunk saved: C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\filter\chunk_25.csv
Filtered chunk saved: C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\filter\chunk_26.csv
Filtered chunk saved: C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\filter\chunk_27.csv
Filtered chunk saved: C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\filter\chunk_28.csv
Filtered chunk saved: C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\filter\chunk_29.csv
Filtered chunk saved: C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\filter\chunk_30.csv
Filtered c

1      1385881980
2      1385881995
3      1385881995
4      1385882160
          ...    
256    1385890267
257    1385890311
258    1385890315
259    1385890332
260    1385890346
Name: time, Length: 261, dtype: object' has dtype incompatible with int64, please explicitly cast to a compatible dtype first.
  df.iloc[:, 0] = df.iloc[:, 0].astype(str).str.strip(' "')
1      1385890542
2      1385890548
3      1385890569
4      1385890620
          ...    
263    1385896726
264    1385896780
265    1385896780
266    1385896906
267    1385896925
Name: time, Length: 268, dtype: object' has dtype incompatible with int64, please explicitly cast to a compatible dtype first.
  df.iloc[:, 0] = df.iloc[:, 0].astype(str).str.strip(' "')
1      1385896945
2      1385896955
3      1385896980
4      1385897036
          ...    
222    1385903043
223    1385903045
224    1385903041
225    1385903038
226    1385903072
Name: time, Length: 227, dtype: object' has dtype incompatible with int64, please expl

Merged CSV saved to C:\Users\Nelson Tausk\Documents\data_project_2025\marked_chuncks_v5\filter\___final_dataset___.csv


**DATA_LOADER_4CHAN** Completed! Now proceed to the second notebook to analyze the the dataset!