In [7]:
import os
import zipfile

def extract_and_truncate(zip_path, output_dir, char_limit=20000):
    """
    Extracts a ZIP file and truncates text files to a specified character limit.
    
    :param zip_path: Path to the ZIP file.
    :param output_dir: Directory to save the extracted and truncated files.
    :param char_limit: Maximum number of characters per text file.
    """
    if not os.path.exists(output_dir):
        os.makedirs(output_dir)
    
    with zipfile.ZipFile(zip_path, 'r') as zip_ref:
        zip_ref.extractall(output_dir)
    
    for root, _, files in os.walk(output_dir):
        for file in files:
            if file.endswith('.txt'):
                file_path = os.path.join(root, file)
                with open(file_path, 'r', encoding='utf-8') as f:
                    content = f.read()
                
                truncated_content = content[:char_limit]
                
                with open(file_path, 'w', encoding='utf-8') as f:
                    f.write(truncated_content)
                
                print(f"Processed: {file} (Truncated to {char_limit} chars)")

# Example usage
zip_path = r"C:\Users\User\Desktop\0108_데이터.zip"  # ZIP 파일 경로
output_folder = r"C:\Users\User\Desktop\data0108"  # 출력 폴더
extract_and_truncate(zip_path, output_folder)


Processed: 1. 브라운의 완벽한 고백 (이정석) (Z-Library).txt (Truncated to 20000 chars)
Processed: Please Look After Mom (Korean Edition) (Kyung-Sook Shin) (Z-Library).txt (Truncated to 20000 chars)
Processed: 개미 1 (베르나르 베르베르  이세욱 옮김) (Z-Library).txt (Truncated to 20000 chars)
Processed: 개미 2 (베르나르 베르베르  이세욱 옮김) (Z-Library).txt (Truncated to 20000 chars)
Processed: 개미 3 (베르나르 베르베르  이세욱 옮김) (Z-Library).txt (Truncated to 20000 chars)
Processed: 개미 4 (베르나르 베르베르  이세욱 옮김) (Z-Library).txt (Truncated to 20000 chars)
Processed: 개미 5 (베르나르 베르베르  이세욱 옮김) (Z-Library).txt (Truncated to 20000 chars)
Processed: 그리스인 조르바 (니코스 카잔차키스 저, 이재형 역) (Z-Library).txt (Truncated to 20000 chars)
Processed: 그리스인 조르바 (영문판) (니코스 카잔차키스) (Z-Library).txt (Truncated to 20000 chars)
Processed: 그해 여름 끝 (옌롄커) (Z-Library).txt (Truncated to 20000 chars)
Processed: 김약국의 딸들 (박경리) (Z-Library).txt (Truncated to 20000 chars)
Processed: 나미야 잡화점의 기적 (히가시노 게이고) (Z-Library).txt (Truncated to 20000 chars)
Processed: 나와 너의 365일 (유이하 저, 김지연 역) (Z-L