This is part of a bigger project that generates and formats datasets in a specified form. This repository contains scripts to combine individual TXT files into a CSV file and perform other dataset formatting tasks.
The scripts in this repository perform the following tasks:
- Combine all TXT files in a specified directory into a single TXT file.
- Convert the combined TXT file into a CSV file.
- Combine all CSV files in a specified directory into a single CSV file while removing specific unwanted rows.
This script combines all .txt
files in the txt-files
directory into a single combined.txt
file and converts it into a combined.csv
file.
This script combines all .csv
files in the Datasets
directory into a single combined.csv
file while removing all lines containing "Pronunciation,Latex".
- Ensure you have the required directories (
txt-files
andDatasets
) in the same location as the scripts. - Run the scripts using Python:
python combine-txt-csv.py python combine-cleanup-csv.py
To delete all .csv
files in the txt-files
directory for cleanup, you can use the following command:
del txt-files\*.csv
rm txt-files/*.csv