# This File is Used to Chunk my Large Files

For HuggingFace, some of my files were too large to upload to the website. I needed to (quickly) break the files down into smaller chunks and then reconstitute them in code. 

Note: The base this code is AI generated, by ChatGPT. Here is my prompt:

```
I have a csv file too big for github. Can you write a python script that breaks it into smaller files and then recombines it?
```

It provided code that I modified below, to chunk up the large CSV files and then recombine.

In [None]:
import os
import pandas as pd

def split_csv(file_path, output_dir, rows_per_chunk=1000): # add rows per chunk argument & change size
    """
    Splits a CSV file into smaller chunks.

    Args:
        file_path (str): Path to the original CSV file.
        output_dir (str): Directory to store the smaller chunk files.
        rows_per_chunk (int): Number of rows per chunk.
    """
    os.makedirs(output_dir, exist_ok=True)
    df_iter = pd.read_csv(file_path, chunksize=rows_per_chunk)
    
    for i, chunk in enumerate(df_iter):
        chunk_file = os.path.join(output_dir, f"chunk_{i:03}.csv")
        chunk.to_csv(chunk_file, index=False)
        print(f"Saved: {chunk_file}")

def combine_csv_chunks(input_dir, output_file):
    """
    Recombines CSV chunks into a single CSV file.

    Args:
        input_dir (str): Directory containing CSV chunk files.
        output_file (str): Path to save the recombined CSV file.
    """
    chunk_files = sorted(
        [f for f in os.listdir(input_dir) if f.startswith("chunk_") and f.endswith(".csv")]
    )

    df_list = []
    for chunk_file in chunk_files:
        chunk_path = os.path.join(input_dir, chunk_file)
        df = pd.read_csv(chunk_path,index_col=0) # add index col
        df_list.append(df)
        print(f"Loaded: {chunk_file}")

    full_df = pd.concat(df_list, ignore_index=True)
    full_df.to_csv(output_file, index=False)
    print(f"Recombined CSV saved to: {output_file}")


In [None]:
# run this to break down the sentiment file
# split_csv('complete_sentiment.csv','sentiment_brokendown')

Saved: sentiment_brokendown/chunk_000.csv
Saved: sentiment_brokendown/chunk_001.csv
Saved: sentiment_brokendown/chunk_002.csv
Saved: sentiment_brokendown/chunk_003.csv
Saved: sentiment_brokendown/chunk_004.csv
Saved: sentiment_brokendown/chunk_005.csv
Saved: sentiment_brokendown/chunk_006.csv
Saved: sentiment_brokendown/chunk_007.csv
Saved: sentiment_brokendown/chunk_008.csv
Saved: sentiment_brokendown/chunk_009.csv
Saved: sentiment_brokendown/chunk_010.csv
Saved: sentiment_brokendown/chunk_011.csv
Saved: sentiment_brokendown/chunk_012.csv
Saved: sentiment_brokendown/chunk_013.csv
Saved: sentiment_brokendown/chunk_014.csv
Saved: sentiment_brokendown/chunk_015.csv
Saved: sentiment_brokendown/chunk_016.csv
Saved: sentiment_brokendown/chunk_017.csv
Saved: sentiment_brokendown/chunk_018.csv
Saved: sentiment_brokendown/chunk_019.csv
Saved: sentiment_brokendown/chunk_020.csv
Saved: sentiment_brokendown/chunk_021.csv
Saved: sentiment_brokendown/chunk_022.csv
Saved: sentiment_brokendown/chunk_

In [None]:
# run this to break down the embedding file
# split_csv('st_embeddings.csv','embed_brokendown')

Saved: embed_brokendown/chunk_000.csv
Saved: embed_brokendown/chunk_001.csv
Saved: embed_brokendown/chunk_002.csv
Saved: embed_brokendown/chunk_003.csv
Saved: embed_brokendown/chunk_004.csv
Saved: embed_brokendown/chunk_005.csv
Saved: embed_brokendown/chunk_006.csv
Saved: embed_brokendown/chunk_007.csv
Saved: embed_brokendown/chunk_008.csv
Saved: embed_brokendown/chunk_009.csv
Saved: embed_brokendown/chunk_010.csv
Saved: embed_brokendown/chunk_011.csv
Saved: embed_brokendown/chunk_012.csv
Saved: embed_brokendown/chunk_013.csv
Saved: embed_brokendown/chunk_014.csv
Saved: embed_brokendown/chunk_015.csv
Saved: embed_brokendown/chunk_016.csv
Saved: embed_brokendown/chunk_017.csv
Saved: embed_brokendown/chunk_018.csv
Saved: embed_brokendown/chunk_019.csv
Saved: embed_brokendown/chunk_020.csv
Saved: embed_brokendown/chunk_021.csv
Saved: embed_brokendown/chunk_022.csv
Saved: embed_brokendown/chunk_023.csv
Saved: embed_brokendown/chunk_024.csv
Saved: embed_brokendown/chunk_025.csv
Saved: embed

In [None]:
# run this to recombine the sentiment file
combine_csv_chunks('sentiment_brokendown')

Loaded: chunk_000.csv
Loaded: chunk_001.csv
Loaded: chunk_002.csv
Loaded: chunk_003.csv
Loaded: chunk_004.csv
Loaded: chunk_005.csv
Loaded: chunk_006.csv
Loaded: chunk_007.csv
Loaded: chunk_008.csv
Loaded: chunk_009.csv
Loaded: chunk_010.csv
Loaded: chunk_011.csv
Loaded: chunk_012.csv
Loaded: chunk_013.csv
Loaded: chunk_014.csv
Loaded: chunk_015.csv
Loaded: chunk_016.csv
Loaded: chunk_017.csv
Loaded: chunk_018.csv
Loaded: chunk_019.csv
Loaded: chunk_020.csv
Loaded: chunk_021.csv
Loaded: chunk_022.csv
Loaded: chunk_023.csv
Loaded: chunk_024.csv
Loaded: chunk_025.csv
Loaded: chunk_026.csv
Loaded: chunk_027.csv
Loaded: chunk_028.csv
Loaded: chunk_029.csv
Loaded: chunk_030.csv
Loaded: chunk_031.csv
Loaded: chunk_032.csv
Loaded: chunk_033.csv
Loaded: chunk_034.csv
Loaded: chunk_035.csv
Loaded: chunk_036.csv
Loaded: chunk_037.csv
Loaded: chunk_038.csv
Loaded: chunk_039.csv
Loaded: chunk_040.csv
Loaded: chunk_041.csv
Loaded: chunk_042.csv
Loaded: chunk_043.csv
Loaded: chunk_044.csv
Loaded: ch

Unnamed: 0.2,Unnamed: 0.1,Unnamed: 0,index,line,character,quote,scene,location,view,episode,date,series,file,sentiment
0,0,0,0,0,QUARK,"You know, Morn -- there's nothing quite as inv...",Al INT. QUARK'S,QUARK'S,INT.,STAR TREK: DEEP SPACE NINE,1996-08-29,Deep Space Nine,504.txt,0.0
1,1,1,1,1,ROM,What's this?,Al INT. QUARK'S,QUARK'S,INT.,STAR TREK: DEEP SPACE NINE,1996-08-29,Deep Space Nine,504.txt,0.0
2,2,2,2,2,QUARK,"What do you mean, ""what's this?"" It's puree of...",Al INT. QUARK'S,QUARK'S,INT.,STAR TREK: DEEP SPACE NINE,1996-08-29,Deep Space Nine,504.txt,0.0
3,3,3,3,3,ROM,I didn't order it.,Al INT. QUARK'S,QUARK'S,INT.,STAR TREK: DEEP SPACE NINE,1996-08-29,Deep Space Nine,504.txt,0.0
4,4,4,4,4,QUARK,"Of course you ""didn't order it"" -- you don't n...",Al INT. QUARK'S,QUARK'S,INT.,STAR TREK: DEEP SPACE NINE,1996-08-29,Deep Space Nine,504.txt,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
144206,144206,144206,144206,356,RIKER,The preparation for the mission... the play......,65A INT. READY ROOM,READY ROOM,INT.,Frame of Mind,1993-02-16,The Next Generation,247.txt,0.0
144207,144207,144207,144207,357,PICARD,"Get some rest, Number One. We can talk more in...",65A INT. READY ROOM,READY ROOM,INT.,Frame of Mind,1993-02-16,The Next Generation,247.txt,0.0
144208,144208,144208,144208,358,RIKER,Alright... but there's one thing I'd like to d...,65A INT. READY ROOM,READY ROOM,INT.,Frame of Mind,1993-02-16,The Next Generation,247.txt,0.0
144209,144209,144209,144209,359,BEVERLY,Are you sure you want to do this by yourself? ...,66 INT. ASYLUM CELL/THEATER,ASYLUM CELL/THEATER,INT.,Frame of Mind,1993-02-16,The Next Generation,247.txt,0.0
