Add an option to parse only hunk positions by p12tic · Pull Request #68 · matiasb/python-unidiff

p12tic · 2020-04-04T19:09:08Z

In my use case I'm using this great library to figure out how code has moved around. For this use case the only thing that matters is the hunk positions and sizes. Unfortunately full diff parsing uses large amount of memory and is slow and there's no way to disable it in this library.

This PR adds a new option only_hunk_positions that enables a fast code path that only parses the hunk headers, only does minimal parsing of the line contents and does not store them. Enabling this option results in ~20 times less memory use and 6 times faster diff parsing on Python 3.6 (2.5 times on Python 2.7).

matiasb · 2020-04-07T22:53:57Z

Looks good! Just pushed an updated PR (#70) solving the conflict and with some refactoring, besides renaming the param as metadata_only instead. Let me know if that makes sense to you?
Thanks 👍

Add an option to parse only hunk positions

37480a9

matiasb mentioned this pull request Apr 7, 2020

Added option to only parse diff metadata #70

Merged

matiasb merged commit 3d99a41 into matiasb:master Apr 22, 2020

p12tic deleted the only-hunk-positions branch August 8, 2020 17:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add an option to parse only hunk positions#68

Add an option to parse only hunk positions#68
matiasb merged 1 commit intomatiasb:masterfrom
p12tic:only-hunk-positions

p12tic commented Apr 4, 2020

Uh oh!

matiasb commented Apr 7, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

p12tic commented Apr 4, 2020

Uh oh!

matiasb commented Apr 7, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants