
# Cleaning CSV with Unbalanced Quotes

This notebook demonstrates how to clean a CSV file containing unbalanced double-quote characters in each comma-separated section.



## Approach

1. Read the CSV file line by line.
2. For each line, process each comma-separated section to remove unbalanced double-quote characters.
3. Parse the corrected lines into a pandas DataFrame.


## Import Necessary Libraries

In [None]:

import pandas as pd
from io import StringIO


## Define the Cleaning Function

In [None]:

def clean_line(line):
    sections = line.split(',')
    cleaned_sections = []
    for section in sections:
        if section.count('"') % 2 != 0:  # Check if the quotes are unbalanced
            section = section.replace('"', '')  # Remove all double quotes
        cleaned_sections.append(section)
    return ','.join(cleaned_sections)


## Load and Clean the CSV File

In [None]:

def load_and_clean_csv(file_path):
    cleaned_lines = []
    with open(file_path, 'r', encoding='utf-8') as file:
        for line in file:
            cleaned_line = clean_line(line.strip())
            cleaned_lines.append(cleaned_line)
    cleaned_content = '\n'.join(cleaned_lines)
    file_like_object = StringIO(cleaned_content)
    df = pd.read_csv(file_like_object)
    return df


## Example Usage

In [None]:

# Replace 'path_to_your_csv_file.csv' with the actual path to your CSV file
file_path = 'path_to_your_csv_file.csv'
df = load_and_clean_csv(file_path)
print(df)
