# Cleaning CSV with Mixed Quoting

This notebook demonstrates how to clean a CSV file containing a mixture of valid and invalid quoting, preserving the valid quotes while removing or correcting the invalid ones.

## Approach

1. Read the CSV file line by line.
2. Apply custom rules to identify and remove or replace invalid quotes.
3. Parse the corrected lines into a pandas DataFrame.

## Import Necessary Libraries

In [None]:
import pandas as pd
from io import StringIO

## Define the Cleaning Function

In [None]:
def clean_line(line):
    # Add custom rules here based on the patterns of invalid quotes in your file
    # Example: Remove quotes at the start and end of a line
    if line.startswith('"') and line.endswith('"'):
        line = line[1:-1]
    return line

## Load and Clean the CSV File

In [None]:
def load_and_clean_csv(file_path):
    cleaned_lines = []
    with open(file_path, 'r', encoding='utf-8') as file:
        for line in file:
            cleaned_line = clean_line(line.strip())
            cleaned_lines.append(cleaned_line)
    cleaned_content = '\n'.join(cleaned_lines)
    file_like_object = StringIO(cleaned_content)
    df = pd.read_csv(file_like_object)
    return df

## Example Usage

In [None]:
# Replace 'path_to_your_csv_file.csv' with the actual path to your CSV file
file_path = 'path_to_your_csv_file.csv'
df = load_and_clean_csv(file_path)
print(df)