Skip to content

Add XML parsing script#129

Merged
Antoinelfr merged 6 commits intomainfrom
xml-script
Mar 18, 2025
Merged

Add XML parsing script#129
Antoinelfr merged 6 commits intomainfrom
xml-script

Conversation

@AdrianDAlessandro
Copy link
Copy Markdown
Collaborator

Description

This PR adds the XML parsing script as-is into the repo. The only changes made to it have been to keep pre-commit happy and to wrap the script in an if __name__ == "__main__": block.

Fixes #124

Type of change

  • Documentation (non-breaking change that adds or improves the documentation)
  • New feature (non-breaking change which adds functionality)
  • Optimization (non-breaking, back-end change that speeds up the code)
  • Bug fix (non-breaking change which fixes an issue)
  • Breaking change (whatever its nature)

Key checklist

  • All tests pass (eg. pytest)
  • The documentation builds and looks OK (eg. mkdocs)
  • Pre-commit hooks run successfully (eg. pre-commit run --all-files)

Further checks

  • Code is commented, particularly in hard-to-understand areas
  • Tests added or an issue has been opened to tackle that in the future. (Indicate issue here: # (issue))

Copy link
Copy Markdown
Collaborator

@alexdewar alexdewar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made one small suggestion. Other than that, how about a docstring at the top of the file just saying what the file is for and why it is the way it is? Doesn't have to be a long one.

Comment thread autocorpus/parse_xml.py Outdated
######### INPUT #########

### A directory containing the path to the XML files to be processed, is a list of str
all_files = glob.glob("./xml_hackathon/*.xml")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know the plan was to commit this script verbatim for now, but I'm wondering if we could make this script a lot more useful by just allowing the user to pass in paths to one or more XML files as arguments, so people can do:

python -m autocorpus.parse_xml my_file.xml

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done this now. Can you merge this if you're happy with it?

Copy link
Copy Markdown
Collaborator

@Antoinelfr Antoinelfr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sgtm

@Antoinelfr Antoinelfr merged commit dfeecef into main Mar 18, 2025
10 checks passed
@AdrianDAlessandro AdrianDAlessandro deleted the xml-script branch May 20, 2025 11:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add XML parsing code as-is

3 participants