Collection of Python scripts for data processing, transformation, and analysis. Handle CSV, JSON, Excel files with ease.
- π CSV data processing and transformation
- π Excel file manipulation
- π JSON data conversion
- π Data cleaning and validation
- π¦ Batch processing support
git clone https://github.com/YOUR_USERNAME/data-processing-scripts.git
cd data-processing-scripts
pip install -r requirements.txt# Process CSV file
python csv_processor.py input.csv output.csv
# Clean data
python data_cleaner.py data.csv
# Convert formats
python format_converter.py data.json data.csvProcess and transform CSV files with filtering, sorting, and aggregation.
python csv_processor.py input.csv output.csv --filter "age>25" --sort nameClean and validate data by removing duplicates, handling missing values.
python data_cleaner.py input.csv --remove-duplicates --fill-missingConvert between CSV, JSON, and Excel formats.
python format_converter.py input.csv output.json
python format_converter.py data.json data.xlsxProcess multiple files in batch mode.
python batch_processor.py --input-dir ./data --output-dir ./processeddata-processing-scripts/
βββ README.md # Documentation
βββ requirements.txt # Dependencies
βββ csv_processor.py # CSV processing
βββ data_cleaner.py # Data cleaning
βββ format_converter.py # Format conversion
βββ batch_processor.py # Batch processing
βββ .gitignore # Git ignore
from csv_processor import CSVProcessor
processor = CSVProcessor('data.csv')
processor.filter(lambda row: row['age'] > 25)
processor.sort_by('name')
processor.save('output.csv')from data_cleaner import DataCleaner
cleaner = DataCleaner('data.csv')
cleaner.remove_duplicates()
cleaner.fill_missing_values(method='mean')
cleaner.save('clean_data.csv')from format_converter import convert
convert('data.csv', 'data.json')
convert('data.json', 'data.xlsx')- Filter rows by conditions
- Sort by columns
- Aggregate data (sum, mean, count)
- Column selection and renaming
- Merge multiple CSV files
- Remove duplicate rows
- Handle missing values (fill, drop, interpolate)
- Remove outliers
- Standardize data formats
- Validate data types
- CSV β JSON β Excel
- Preserves data types
- Handles large files
- Custom encoding support
- Process directory of files
- Parallel processing support
- Progress tracking
- Error handling and logging
Create a config.json file:
{
"encoding": "utf-8",
"delimiter": ",",
"chunk_size": 10000,
"parallel_workers": 4
}python csv_processor.py sales.csv filtered_sales.csv \
--filter "revenue>1000" \
--sort date \
--columns date,customer,revenuepython data_cleaner.py users.csv \
--remove-duplicates \
--fill-missing mean \
--remove-outlierspython batch_processor.py \
--input-dir ./raw_data \
--output-dir ./processed \
--format json \
--parallelFor large files, use chunking:
processor.process_chunks(chunk_size=10000)Specify encoding:
processor = CSVProcessor('data.csv', encoding='latin-1')Enable parallel processing:
python batch_processor.py --parallel --workers 8- Validate input data before processing
- Use chunking for large files
- Enable logging for debugging
- Backup data before transformation
- Test on sample before full processing
- Fork the repository
- Create feature branch
- Commit changes
- Push to branch
- Open Pull Request
MIT License - see LICENSE file
Made with β€οΈ for data engineers
"""Documentation updated"""
@decorator def enhanced_function(): """Enhanced functionality""" return improved_result()