This library provides an EdText class for selecting and manipulating lines
of text from a string, using addressing inspired by the classic ed text editor.
This isn't on PyPI yet, if you want to use it, install it from GitHub:
python -m pip install git+https://github.com/nedbat/edtext
Suppose we have this file:
# gettysburg.txt Four score and seven years ago our fathers brought forth on this continent, a new nation, conceived in Liberty, and dedicated to the proposition that all men are created equal. Now we are engaged in a great civil war, testing whether that nation, or any nation so conceived and so dedicated, can long endure. We are met on a great battle-field of that war. We have come to dedicate a portion of that field, as a final resting place for those who here gave their lives that that nation might live. It is altogether fitting and proper that we should do this. -- Abraham Lincoln, Gettysburg PA, 1863
Make an EdText object from the text of the file:
>>> getty = EdText(Path("gettysburg.txt").read_text())
The lines of text are stored for selection and manipulation. The full text is recreated when the object is turned into a string:
>>> str(getty)[:60]
'# gettysburg.txt\n\nFour score and seven years ago our fathers'Instead of using string slicing, EdText objects provide line selection.
It's available via three aliases: range(), ranges(), or list-like
slicing with square brackets. All do the same operation: select lines based on
the addresses provided, and produce a new EdText object.
Here we select lines starting from the first line that matches "Four" to the line before the next blank line:
>>> print(getty.range("/Four/; /^$/-"))
Four score and seven years ago our fathers brought forth
on this continent, a new nation, conceived in Liberty, and
dedicated to the proposition that all men are created equal.The range argument is a string with the ed range to select. In this
example, /Four/ means the first line containing the regex "Four", the
semicolon means to continue from that point, /^$/ matches the next blank
line, and the trailing - backs up one line to select the line before the
blank line.
You can use a number of address ranges to select a more than one range at once:
>>> print(getty.range("/Four/; +2", "$"))
Four score and seven years ago our fathers brought forth
on this continent, a new nation, conceived in Liberty, and
dedicated to the proposition that all men are created equal.
-- Abraham Lincoln, Gettysburg PA, 1863The /Four/;+2 means the line matching "Four" then two more lines. $
means the last line.
With multiple address ranges, each range starts from where the previous range ended.
Although we are using strings to determine line numbers, this feels like
slicing, so square bracket slicing does the same thing as range():
>>> print(getty["/Now/;/\./", "$-;$"])
Now we are engaged in a great civil war, testing whether that
nation, or any nation so conceived and so dedicated, can long
endure. We are met on a great battle-field of that war. We have
-- Abraham Lincoln, Gettysburg PA, 1863Note that you must use strings, not integers, for slicing, and that like ed,
lines are numbered starting from 1. To get lines 10 through 12, [10, 12]
won't work, you need to use ["10, 12"]:
>>> print(getty["10, 12"])
come to dedicate a portion of that field, as a final resting
place for those who here gave their lives that that nation
might live. It is altogether fitting and proper that we shouldSince we can select a number of ranges at once, ranges() is an alias for
range().
Another operation is EdText.sub(), which makes regex replacements on
selected lines:
>>> print(getty.sub("g/and/", r"e", "E")["1,5"])
# gettysburg.txt
Four scorE and sEvEn yEars ago our fathErs brought forth
on this continEnt, a nEw nation, concEivEd in LibErty, and
dedicated to the proposition that all men are created equal.The first argument is a range of line addresses, the line in which to apply the
substitution. Note that /pat/ finds the next matching line, not all
matching lines. Use g/pat/ to select all lines matching the pattern.
The result of sub() is another EdText object. You can do further
manipulations or selections.
>>> print(getty["g/and/"])
Four score and seven years ago our fathers brought forth
on this continent, a new nation, conceived in Liberty, and
nation, or any nation so conceived and so dedicated, can long
might live. It is altogether fitting and proper that we shouldI use cog to interpolate text files or code exection output into documentation, presentations and the like. I often want only a subset of the lines. Over the years I'd built a utility function to make the selection in various ways. It had become baroque, confusing, and cumbersome; and still didn't do everything I wanted. I realized that ed already had the language I needed for selecting and manipulating text. edtext was born.
For more back-story, see my EdText blog post.
First version.