# Collating for real with CollateX. Files

In this exercise, follow the instructions here: read the Markdown cells and execute the Code cells (the ones with In + a number on their left).

Not sure how to execute cells in a Notebook? Check the [Jupyter Notebook tutorial](../02_PrepareEnvironment/JupyterNotebook.ipynb).

## 1. First exercise (Darwin texts). Read from files and HTML output.

Import the *collatex* Python library

In [1]:
from collatex import *

Create a collation object

In [2]:
collation = Collation()

### Read text from files

Now open the texts in "../data/Darwin" and let Python read them.

The indication 'par1' in the name of each file indicates here that it is only the first paragraph.

The code below is how Python read a file: it is not CollateX code, but general Python way of doing things. Each file is opened, read (using a specific character encoding) and stored in a variable ('witness_1859', etc.). The name of the variable cannot contain whitespaces!

In [3]:
witness_1859 = open( "../data/Darwin/darwin1859_par1.txt", encoding='utf-8' ).read()
witness_1860 = open( "../data/Darwin/darwin1860_par1.txt", encoding='utf-8' ).read()
witness_1861 = open( "../data/Darwin/darwin1861_par1.txt", encoding='utf-8' ).read()
witness_1866 = open( "../data/Darwin/darwin1866_par1.txt", encoding='utf-8' ).read()
witness_1869 = open( "../data/Darwin/darwin1869_par1.txt", encoding='utf-8' ).read()
witness_1872 = open( "../data/Darwin/darwin1872_par1.txt", encoding='utf-8' ).read()

Just to be sure that the text in the files has been stored, try to print one of them.

In [4]:
print(witness_1859)

WHEN we look to the individuals of the same variety or sub-variety of our older cultivated plants and animals, one of the first points which strikes us, is, that they generally differ much more from each other, than do the individuals of any one species or variety in a state of nature. When we reflect on the vast diversity of the plants and animals which have been cultivated, and which have varied during all ages under the most different climates and treatment, I think we are driven to conclude that this greater variability is simply due to our domestic productions having been raised under conditions of life not so uniform as, and somewhat different from, those to which the parent-species have been exposed under nature. There is, also, I think, some probability in the view propounded by Andrew Knight, that this variability may be partly connected with excess of food. It seems pretty clear that organic beings must be exposed during several generations to the new conditions of life to ca

Or another one

In [5]:
print(witness_1872)

Causes of Variability. WHEN we compare the individuals of the same variety or sub-variety of our older cultivated plants and animals, one of the first points which strikes us is, that they generally differ more from each other than do the individuals of any one species or variety in a state of nature. And if we reflect on the vast diversity of the plants and animals which have been cultivated, and which have varied during all ages under the most different climates and treatment, we are driven to conclude that this great variability is due to our domestic productions having been raised under conditions of life not so uniform as, and somewhat different from, those to which the parent-species had been exposed under nature. There is, also, some probability in the view propounded by Andrew Knight, that this variability may be partly connected with excess of food. It seems clear that organic beings must be exposed during several generations to new conditions to cause any great amount of vari

### Add them to the CollateX instance as witnesses

This is similar to what we've done in the previous exercise, but instead of the text we put here the variable containing the text read from the files.

In [6]:
collation.add_plain_witness( "1859", witness_1859 )
collation.add_plain_witness( "1860", witness_1860 )
collation.add_plain_witness( "1861", witness_1861 )
collation.add_plain_witness( "1866", witness_1866 )
collation.add_plain_witness( "1869", witness_1869 )
collation.add_plain_witness( "1872", witness_1872 )

### New output: HTML table

When you create the collation result, use the output option to specify the output you want. Here, set to hmlt.

In [7]:
alignment_table = collate(collation, layout='vertical', output="html")
print(alignment_table)

1859,1860,1861,1866,1869,1872
-,-,-,Causes of Variability.,Causes of Variability.,Causes of Variability.
WHEN we,WHEN we,WHEN we,WHEN we,WHEN we,WHEN we
look to,look to,look to,look to,compare,compare
"the individuals of the same variety or sub-variety of our older cultivated plants and animals, one of the first points which strikes us","the individuals of the same variety or sub-variety of our older cultivated plants and animals, one of the first points which strikes us","the individuals of the same variety or sub-variety of our older cultivated plants and animals, one of the first points which strikes us","the individuals of the same variety or sub-variety of our older cultivated plants and animals, one of the first points which strikes us","the individuals of the same variety or sub-variety of our older cultivated plants and animals, one of the first points which strikes us","the individuals of the same variety or sub-variety of our older cultivated plants and animals, one of the first points which strikes us"
",",",",",",",",-,-
"is, that they generally differ","is, that they generally differ","is, that they generally differ","is, that they generally differ","is, that they generally differ","is, that they generally differ"
much,-,-,-,-,-
more,more,more,more,-,more
from each other,from each other,from each other,from each other,from each other,from each other
",",-,-,-,more,-


None


## 2. Second exercise (Woolf texts). Read from files and HTML2 output.

In the second exercise, repeat the previous steps, now using the texts at "../data/Woolf/Lighthouse-2" and visualizing the output with the more sophisticated HTML option (HTML2).

We will be using different editions of Virginia Woolf's *To the lighthouse*:

    USA = New York: Harcourt, Brace & Company, 1927 (1st USA edition)
    UK = Londond: R & R Clark Limited, 1827 (1st UK edition)
    EM (EVERYMAN) = London: J. M. Dent & Sons LTD, 1938 (reprint 1952)

The facsimiles and trascriptions of the editions are available at http://woolfonline.com/. Please refer to the information in the data directory for the materials licence.

Note that the output 'html2' is specified this time: colors should appear!

In [8]:
from collatex import *
collation = Collation()
witness_USA = open( "../data/Woolf/Lighthouse-1/Lighthouse-1-USA.txt", encoding='utf-8' ).read()
witness_UK = open( "../data/Woolf/Lighthouse-1/Lighthouse-1-UK.txt", encoding='utf-8' ).read()
witness_EM = open( "../data/Woolf/Lighthouse-1/Lighthouse-1-EM.txt", encoding='utf-8' ).read()
collation.add_plain_witness( "USA", witness_USA )
collation.add_plain_witness( "UK", witness_UK )
collation.add_plain_witness( "EM", witness_EM )
alignment_table = collate(collation, output='html2')

USA,UK,EM
"When she looked in the glass and saw her hair grey, her cheek sunk, at fifty, she thought, possibly she might have managed things better—her husband; money; his books. But for her own part she would never for a single second regret her decision, evade difficulties, or slur over duties. She was now formidable to behold, and it was only in silence, looking up from their plates, after she had spoken so severely about Charles Tansley, that her daughters","When she looked in the glass and saw her hair grey, her cheek sunk, at fifty, she thought, possibly she might have managed things better—her husband; money; his books. But for her own part she would never for a single second regret her decision, evade difficulties, or slur over duties. She was now formidable to behold, and it was only in silence, looking up from their plates, after she had spoken so severely about Charles Tansley, that her daughters","When she looked in the glass and saw her hair grey, her cheek sunk, at fifty, she thought, possibly she might have managed things better—her husband; money; his books. But for her own part she would never for a single second regret her decision, evade difficulties, or slur over duties. She was now formidable to behold, and it was only in silence, looking up from their plates, after she had spoken so severely about Charles Tansley, that her daughters"
",",—,—
"Prue, Nancy, Rose—could sport with infidel ideas which they had brewed for themselves of a life different from hers; in Paris, perhaps; a wilder life; not always taking care of some man or other; for there was in all their minds a mute questioning of deference and chivalry, of the Bank of England and the Indian Empire, of ringed fingers and lace, though to them all there was something in this of the essence of beauty, which called out the manliness in their girlish hearts, and made them, as they sat at table beneath their mother’s eyes, honour her strange severity, her extreme courtesy, like a","Prue, Nancy, Rose—could sport with infidel ideas which they had brewed for themselves of a life different from hers; in Paris, perhaps; a wilder life; not always taking care of some man or other; for there was in all their minds a mute questioning of deference and chivalry, of the Bank of England and the Indian Empire, of ringed fingers and lace, though to them all there was something in this of the essence of beauty, which called out the manliness in their girlish hearts, and made them, as they sat at table beneath their mother’s eyes, honour her strange severity, her extreme courtesy, like a","Prue, Nancy, Rose—could sport with infidel ideas which they had brewed for themselves of a life different from hers; in Paris, perhaps; a wilder life; not always taking care of some man or other; for there was in all their minds a mute questioning of deference and chivalry, of the Bank of England and the Indian Empire, of ringed fingers and lace, though to them all there was something in this of the essence of beauty, which called out the manliness in their girlish hearts, and made them, as they sat at table beneath their mother’s eyes, honour her strange severity, her extreme courtesy, like a"
Queen,Queen,queen
’s raising from the mud,’s raising from the mud,’s raising from the mud
to wash,-,-
a beggar,a beggar,a beggar
’,’,'
s dirty foot,s dirty foot,s dirty foot
-,and washing,and washing


## 3. Third exercise (the sonnet about writing a sonnet). Read from files and HTML2 output.

You now know how to collate texts stored in files. Try with the other materials inside the data directory: the sonnet about writing a sonnet, that you have been using to start encoding in TEI. Collate the two versions of the sonnet.




In [10]:
from collatex import *
collation = Collation()
witness_1707 = open( "../data/sonnet/Lope_soneto_FR_1707.txt", encoding='utf-8' ).read()
witness_1822 = open( "../data/sonnet/Lope_soneto_FR_1822.txt", encoding='utf-8' ).read()
collation.add_plain_witness( "witness1707", witness_1707 )
collation.add_plain_witness( "witness1822", witness_1822 )
alignment_table = collate(collation, output='html2')

witness1707,witness1822
"Doris, qui sait qu'aux vers quelquefois je me plais, Me demande un sonnet","Doris, qui sait qu'aux vers quelquefois je me plais, Me demande un sonnet"
;,","
"et je m'en désespère: Quatorze vers, grand Dieu!","et je m'en désespère: Quatorze vers, grand Dieu!"
le,Le
moyen de les faire,moyen de les faire
!,?
En voilà cependant déjà quatre de faits. Je ne pouvais d'abord trouver de rime,En voilà cependant déjà quatre de faits. Je ne pouvais d'abord trouver de rime
;,","
mais,mais
",",-
