I'd like to get an overview of thematic changes in the newspaper Postimees between 1999 and 2000.
I'll download the files from [Reference corpus of Estonian: Postimees](http://www.cl.ut.ee/korpused/segakorpus/postimees/failid/xml/Postimees.zip) and unzip them.

To get the wordcounts, I'll use python libraries `scrapy` and `pandas`.

In [1]:
import glob
import scrapy
from collections import Counter
from itertools import chain
import pandas as pd

def make_csv(dirname):
    counts = Counter()

    for file in glob.glob(dirname + '/*xml'):
        tree = scrapy.selector.Selector(text=open(file).read())
        tree.remove_namespaces()
        counts.update(
        chain(*((j for j in i.split(' ') if j) for i in tree.xpath('//s/text()').extract()))
        )
    df = pd.DataFrame.from_records(list(counts.items()), columns=['word', 'count'], index='word')
    df.sort_values('count',ascending=False).to_csv(dirname + '.csv')

In [3]:
make_csv('postimees_2000')
make_csv('postimees_1999')

Now I should have two wordlists `postimees_1999.csv` and `postimees_2000.csv`. 
I'll execute the volcanoplot command-line program and filter out words that ocurred less than 200 times.

```bash
python volcanoplot.py postimees_1999.csv postimees_2000.csv postimees.html \
       --header True --filter_below_total_count 200

```

The file postimees.html should look like this:

![](../_static/volcano.png)

Exporting two wordlists from regions "1" and "3" results in two wordlists.

These can give us some ideas about the most important themes of these two years. 

1999 was the year for local elections an as such, the names of political parties are dominant (Isamaa, Isamaaliit, Keskerakond, Koonderakond, Erakond, Erakonna, Maarahva, Refaormierakond etc.)

It was war in Kosovo so "Jugoslaavia", "Serbia", "albaanlaste", "Kosovo", "NATO", "OSCE" are overrepresented.

In 2000, it was election in the USA ("Bush", "Gore"), "Putin" came to power in Russia, it was the summer of "Sydney" olympics ("olümpiamängud"). There were the ["NRG" deal](https://et.wikipedia.org/wiki/NRG-tehing) ("Elektrijaamade", "Energia", "Narva") and  privatization of "Edelaraudtee".


| 1999              |  2000                                                 |
|-------------------|-------------------------------------------------------|
| AB                |            'i                                         |
| AIN               |            2002.                                      |
| AITA              |            2003.                                      |
| ALVELA            |            Annika                                     |
| ANDRUS            |            Argo                                       |
| Allik             |            Austraalias                                |
| Arvamus           |            Austria                                    |
| BC                |            BBC                                        |
| BNSile            |            Barcelona                                  |
| Dagestani         |            Bush                                       |
| Draamateatri      |            CDU                                        |
| ELO               |            Edelaraudtee                               |
| EME               |            Elektrijaamade                             |
| ERILAID           |            Energia                                    |
| Erakond           |            Erkki                                      |
| Erakonna          |            Gore                                       |
| Eve               |            Kadri                                      |
| Hiinas            |            Kaljuste                                   |
| Indoneesia        |            Kanepi                                     |
| Iraagi            |            Kerdi                                      |
| Isamaa            |            Kert                                       |
| Isamaaliidu       |            Loodus                                     |
| Jeltsin           |            Microsofti                                 |
| Jugoslaavia       |            NRG                                        |
| KAAREL            |            NTV                                        |
| KAAS              |            Narva                                      |
| KALLAS            |            Portugali                                  |
| KERSTI            |            Putin                                      |
| Kallo             |            Putini                                     |
| Keskerakond       |            Pärnoja                                    |
| Keskerakonna      |            Sydney                                     |
| Kesklinna         |            Sydneys                                    |
| Keskturu          |            Veiko                                      |
| Kindlustuse       |            Visnapuu                                   |
| Kodu              |            ajalehes                                   |
| Koonderakond      |            ajaleht                                    |
| Koonderakonna     |            andrus.nilk@postimees.ee                   |
| Kosovo            |            argo.ideon@postimees.ee                    |
| Kosovos           |            demokraatide                               |
| Kosovosse         |            elektri                                    |
| Kristiine         |            elektrijaamade                             |
| KÜLLIKE           |            erkki.erilaid@postimees.ee                 |
| Lasnamäe          |            haldusreformi                              |
| Liivak            |            harli.uljas@postimees.ee                   |
| MARGUS            |            keskpanga                                  |
| MERISALU          |            kohusetäitja                               |
| Maarahva          |            kõrghariduse                               |
| Mait              |            kütuse                                     |
| Metsamaa          |            lk                                         |
| Muuga             |            luule                                      |
| Mõõdukate         |            mainis                                     |
| N                 |            marek.laane@postimees.ee                   |
| NATO              |            olümpial                                   |
| NEIMAN            |            olümpiamängude                             |
| NIITRA            |            plaadi                                     |
| NILK              |            plaat                                      |
| NILS              |            poisi                                      |
| Norma             |            presidendiks                               |
| OSCE              |            priit.rajalo@postimees.ee                  |
| OTTAS             |            reformi                                    |
| PAJU              |            rein.karner@postimees.ee                   |
| PUTTING           |            riigikaitse                                |
| Pirita            |            sport@postimees.ee                         |
| Punase            |            tonu.kees@postimees.ee                     |
| ROOVÄLI           |            uudised@postimees.ee                       |
| Reformierakond    |            valis@postimees.ee                         |
| Reformierakonna   |            vangla                                     |
| Riigikokku        |            ülikoolide                                 |
| Sadama            |            üüri                                       |
| Saporta           |                                                       |
| Savisaare         |                                                       |
| Seli              |                                                       |
| Serbia            |                                                       |
| Siimann           |                                                       |
| Siimanni          |                                                       |
| Slobodan          |                                                       |
| TAIVO             |                                                       |
| TARMO             |                                                       |
| TEET              |                                                       |
| TIIT              |                                                       |
| Telekomi          |                                                       |
| URMAS             |                                                       |
| VEIKO             |                                                       |
| VIRKI             |                                                       |
| VISNAPUU          |                                                       |
| Vabaduse          |                                                       |
| Vare              |                                                       |
| Veerpalu          |                                                       |
| Volikogu          |                                                       |
| Väli              |                                                       |
| WTO               |                                                       |
| aastatuhande      |                                                       |
| aktsia            |                                                       |
| aktsiaseltsi      |                                                       |
| albaanlaste       |                                                       |
| börsi             |                                                       |
| börsil            |                                                       |
| erakond           |                                                       |
| erakondade        |                                                       |
| erakonna          |                                                       |
| erakonnad         |                                                       |
| halduskogu        |                                                       |
| hääle             |                                                       |
| hääli             |                                                       |
| häält             |                                                       |
| i                 |                                                       |
| investorid        |                                                       |
| investorite       |                                                       |
| kasvõi            |                                                       |
| kohalikel         |                                                       |
| kolmikliidu       |                                                       |
| kriis             |                                                       |
| kriisi            |                                                       |
| käibe             |                                                       |
| l                 |                                                       |
| lennukid          |                                                       |
| linnaosa          |                                                       |
| lund              |                                                       |
| müüri             |                                                       |
| nimekiri          |                                                       |
| nimekirja         |                                                       |
| nimekirjas        |                                                       |
| novembri          |                                                       |
| novembril         |                                                       |
| nr                |                                                       |
| oktoobri          |                                                       |
| oktoobril         |                                                       |
| paarisnumbrid     |                                                       |
| paaritud          |                                                       |
| parlamenti        |                                                       |
| seltsi            |                                                       |
| serblaste         |                                                       |
| säästueelarve     |                                                       |
| teede             |                                                       |
| torni             |                                                       |
| tulumaksu         |                                                       |
| turu              |                                                       |
| tänavate          |                                                       |
| valija            |                                                       |
| valijad           |                                                       |
| valijate          |                                                       |
| valimisi          |                                                       |
| valimiskampaania  |                                                       |
| valimiste         |                                                       |
| valimistel        |                                                       |
| volikogu          |                                                       |
| volikogus         |                                                       |
| x                 |                                                       |
| Ühendatud         |                                                       |
