File Parser

A Python application that processes files based on job definitions. It supports multiple file transformations including:

Extracting ZIP files
Converting XML files to CSV format

Note: This project simulates S3 paths locally. Any path starting with s3:// will be automatically converted to a local path by replacing s3:// with s3_simulation/. For example, s3://alejo-parsers/file.zip becomes s3_simulation/alejo-parsers/file.zip.

Note: The job definition file path is hardcoded as job_definition.json in main.py. Make sure to place your job definition file in the project root directory.

Project Structure

.
├── src/
│   ├── parsers/
│   │   ├── base_parser.py      # Base parser class
│   │   ├── zip_parser.py       # ZIP file extraction
│   │   └── xml_parser.py       # XML to CSV conversion
│   ├── config.py               # Configuration and logging setup
│   └── main.py                 # Main application entry point
├── logs/                       # Log files directory
│   └── file_parser.log         # Application logs
├── s3_simulation/              # Local directory for S3 path simulation
└── job_definition.json         # Job configuration file

Requirements

Python 3.x
Required packages:
- pandas
- lxml

Installation

Clone the repository:

git clone https://github.com/rubenoliveros/file_parser.git
cd file_parser

Install dependencies:

pip install pandas lxml

Usage

Create a job definition file (job_definition.json) with your transformations:

{
    "transformations": [
        {
            "object": {
                "parser": "unzip",
                "origin": "s3://alejo-parsers/workspace1/sources/rutafuente1/miarchivo1.zip",
                "destiny": "s3://alejo-parsers/workspace1/sources/rutafuente2/",
                "classname": "ZipFileParser"
            },
            "kwargs": {
                "scripts_path": "scripts/",
                "scripts_bucket": "alejo-scripts"
            }
        },
        {
            "object": {
                "parser": "xml_to_csv",
                "origin": "s3://alejo-parsers/workspace1/sources/rutafuente1/miarchivo2.xml",
                "destiny": "s3://alejo-parsers/workspace1/sources/rutafuente2/",
                "classname": "XmlToCsvParser"
            },
            "kwargs": {
                "scripts_path": "scripts/",
                "scripts_bucket": "alejo-scripts"
            }
        }
    ]
}

Place your input files in the corresponding local directories under s3_simulation/. For example:
- s3_simulation/alejo-parsers/workspace1/sources/rutafuente1/miarchivo1.zip
- s3_simulation/alejo-parsers/workspace1/sources/rutafuente1/miarchivo2.xml
Run the application:

python3 src/main.py

Supported Parsers

ZIP Parser (unzip)
- Extracts contents of a ZIP file to a destination directory
- Example: "parser": "unzip"
XML to CSV Parser (xml_to_csv)
- Converts XML files to CSV format
- CSV headers: name, email, street, city, country
- Example: "parser": "xml_to_csv"

Logging

The application logs all operations to:

Console output
logs/file_parser.log

Log entries include:

Timestamp
Log level (INFO/ERROR)
Operation details

Error Handling

The application handles various error cases:

Missing job definition file
Invalid JSON format
Unsupported parser types
File not found errors
Processing errors

All errors are logged with detailed messages for debugging.

Contributing

Feel free to submit issues and enhancement requests!

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
s3_simulation/alejo-parsers/workspace1/sources		s3_simulation/alejo-parsers/workspace1/sources
src		src
.gitignore		.gitignore
README.md		README.md
job_definition.json		job_definition.json
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

File Parser

Project Structure

Requirements

Installation

Usage

Supported Parsers

Logging

Error Handling

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

File Parser

Project Structure

Requirements

Installation

Usage

Supported Parsers

Logging

Error Handling

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages