# Eksploracja Krajowych planów na rzecz energii i klimatu

## Co wyniknęło z eksploracji? 
- Głównym krokiem przed eksploracją było przetworzenie danych w odpowiedni sposób. Uwzględniona została przy tym zakładana struktura dokumentów. Okazało się, że rzeczywiście występują istotne różncie pomiędzy poszczególnymi sekcjami i wymiarami, zatem analiza w podziale na składowe dokumentu ma sens.
- Okazało się również, że w danych widać różnice pomiędzy dokumentami poszczególnych państw, co także stanowi dobry znak dla dalszej pracy zakładającej dokładniejsze porównania między poszczególnymi członkami UE. 
- Pojawiły się kolejne pytania badawcze, np. dotyczące różnic w traktowaniu o transporcie w procesie dekarbonizacji. 
- Zidentyfikowano słowa, które należy rozważyć w kontekście uwzględnienia jako stop-słowa.
- Wskazano dalsze kroki: próba poprawy odczytu tekstów z PDF (bez tabel, wykresów, numeracji stron)

## Importy

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
! pip install swifter
! pip install matplotlib==3.4.0
! pip install textacy
! pip install thinc
! pip install gensim
! pip install pyLDAvis

In [None]:
!python -m spacy download en_core_web_lg
# trzeba uruchomić ponownie środowisko wykonawcze po pobraniu

In [None]:
import pandas as pd
import numpy as np
import spacy
from gensim.corpora.dictionary import Dictionary
from gensim.models.ldamulticore import LdaMulticore

import pyLDAvis.gensim_models
pyLDAvis.enable_notebook()

In [None]:
en = spacy.load("en_core_web_lg")

In [None]:
import os
import pickle
from collections import Counter
from tqdm import tqdm
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_theme(style="whitegrid")
import plotly.express as px

In [None]:
DIR = '/content/drive/MyDrive/NLP-klimat/'

## Analiza

### Wczytanie danych 
Teksty zostały odczytane z PDF-ów na podstawie wcześniejszego otagowania  poszczególnych dokumentów. 

In [None]:
NECP_annotations = pd.read_csv(DIR+'NECP.txt')

In [None]:
NECP_annotations = NECP_annotations.replace({"None": None})

#### Intro: czym jest NECP?

**NECP** - National Energy and Climate Plan (Krajowy plan na rzecz energii i klimatu)

Aby zrealizować ustanowione przez Unię Europejską cele w zakresie energii i klimatu na 2030 rok, państwa członkowskie zostały zobowiązane do ustanowienia 10-letniego planu na rzecz energii i klimatu na okres od 2021 do 2030 roku (NECP).

**Struktura NECP**

![](https://i.ibb.co/nD280yN/necp-structure.png)


Zatem dla każdego z 27 państw członkowskich otrzymujemy sekcje:
- Overview and Process for Establishing the Plan - Zarys ogólny i proces opracowywania planu
- National Objectives and Targets - Krajowe założenia i cele
- Policies and Measures - Polityki i działania
- Current Situation and Reference Projections - Aktualna sytuacja i prognozy z obecną polityką klimatyczną
- Impact Assessment of Planned Policies and Measures - Ocena wpływu planowanych działań na rzecz klimatu

Według wzorcowej struktury sekcje 2-5 powinny byc podzielone na 5 wymiarów:
- Decarbonisation - Obniżenie emisyjności
- Energy efficiency - Efektywność energetyczna
- Energy security - Bezpieczeństwo energetyczne
- Internal market - Wewnętrzny rynek energii
- R&I and Competitiveness - Badania naukowe, innowacje i konkurencyjność

W rzeczywistości w większości planów w sekcji oceny wpływu planowanych działań na rzecz klimatu nie ma podziału na 5 wymiarów

In [None]:
necp_processed = pd.read_csv(DIR+'necp_processed.csv', index_col = 0)

Kolumny zaimportowanej ramki danych.

In [None]:
necp_processed.columns

In [None]:
necp_processed.drop(['start_page', 'end_page', 'start_text', 'end_text'], axis = 1, inplace = True)

In [None]:
necp_processed.drop(necp_processed[necp_processed.isnull()["text"]].index, axis = 0, inplace = True)

In [None]:
len(necp_processed)

Zostało 453 części dokumentów.

##### Przetworzenie tekstów

In [None]:
import swifter
import warnings
warnings.filterwarnings("default")

In [None]:
tqdm.pandas()
necp_docs = necp_processed['text'].swifter.apply(en)

In [None]:
# eksport przetworzonych dokumentów
with open(DIR + 'necp_docs_lg.pickle', 'wb') as f:
  pickle.dump(necp_docs, f)

##### Wczytanie

In [None]:
# import przetworzonych dokumentów
with open(DIR + 'necp_docs_lg.pickle', 'rb') as f:
    necp_docs_2 = pickle.load(f)

In [None]:
countries_stop_words = ['Austria', 'Austrian', 'Belgium', 'Belgian', 'Bulgaria', 'Bulgarian', 'Czech', 'Cyprus', 'Cypriot', 'Germany', 'German',
                      'Denmark', 'Danish', 'Estonia', 'Estonian', 'Croatia', 'Croatian', 'Finland', 'Finnish', 'France', 'French', 'Malta', 'Maltese',
                      'Luxembourg', 'Lithuania', 'Lithuanian', 'Latvia', 'Latvian', 'Italy', 'Italian', 'Ireland', 'Irish', 'Hungary', 'Hungarian',
                      'Greece', 'Greek', 'Spain', 'Spanish', 'Netherlands', 'Dutch', 'Poland', 'Polish', 'Portugal', 'Portuguese', 'Romania', 'Romanian',
                      'Sweden', 'Swedish', 'Slovenia', 'Slovenian', 'Slovakia', 'Slovak']

extra_stop_words =  ['energy', 'figure', 'table', 'plan', "necp", 'national', 'use', "measure", "sector", "climate",
                     "plan", "dimension", "integrated", "section", "republic", "measures", "policies", "target", "objective", "policy",
                     "projection", "assessment", "federal", "government"]

necp_processed["necp_lemmas"] = necp_docs.swifter.apply(lambda doc: [token.lemma_ for token in doc 
                                                             if not token.is_stop 
                                                             if not token.is_punct
                                                             if not (token.lemma_ in countries_stop_words) 
                                                             if not (token.lemma_.lower() in extra_stop_words) 
                                                             if token.is_alpha])

In [None]:
from gensim.models import Phrases
bigram = Phrases(necp_processed["necp_lemmas"], min_count=20)
for idx in necp_processed["necp_lemmas"].index:
    for token in bigram[necp_processed["necp_lemmas"][idx]]:
        if '_' in token:
            necp_processed["necp_lemmas"][idx].append(token)

In [None]:
def plot_counter(counter: Counter, orient: str = 'h', color: str='lightblue', figsize: tuple=(20,13)):
  plt.figure(figsize=figsize)
  keys = [k[0] for k in counter]
  vals = [int(k[1]) for k in counter]
  ax = sns.barplot(x=vals, y=keys, orient=orient, color=color)
  ax.bar_label(ax.containers[0])
  return ax

In [None]:
from gensim.models import CoherenceModel

### **Dimension**: Decarbonisation

In [None]:
decarbonisation_docs = necp_processed[(necp_processed['energy_union_dimension'] == "Decarbonisation")]["necp_lemmas"]
decarbonisation_counter = Counter(decarbonisation_docs.sum()).most_common(30)
plot_counter(decarbonisation_counter)
plt.show()

In [None]:
decarbonisation_docs = decarbonisation_docs.apply(lambda doc: [lemma for lemma in doc if not (lemma in ['emission', 'renewable'])])

In [None]:
decarbonisation_dictionary = Dictionary(decarbonisation_docs)
decarbonisation_dictionary.filter_extremes(no_below=2, no_above=1.0)
decarbonisation_encoded_docs = decarbonisation_docs.apply(decarbonisation_dictionary.doc2bow)

In [None]:
decarbonisation_models = []
for topics_number in tqdm(range(3, 13)):
    lda = LdaMulticore(decarbonisation_encoded_docs, num_topics=topics_number, passes=8, iterations=100, random_state=123)
    decarbonisation_models.append(lda)

In [None]:
decarbonisation_cvs = []
for model in tqdm(decarbonisation_models):
    cm = CoherenceModel(model,texts=decarbonisation_docs, dictionary=decarbonisation_dictionary)
    c_v = cm.get_coherence()
    decarbonisation_cvs.append(c_v)

In [None]:
px.line(x=range(3, 13), y=decarbonisation_cvs)

In [None]:
vis = pyLDAvis.gensim_models.prepare(decarbonisation_models[4], decarbonisation_encoded_docs, dictionary=decarbonisation_dictionary)
vis

In [None]:
for idx, topic in decarbonisation_models[4].show_topics(formatted=False, num_words=15):
    print('Topic: {} \nWords: {}'.format(idx, [decarbonisation_dictionary[int(w[0])] for w in topic]))

- Topic 0: gas, source, support, fuel, forest land
- Topic 1: renewable energy sources, development, new, power generation, electricity
- Topic 2: share, increase, consumption, total, GHG, gross, decrease
> 'The Commission envisions the EU as the global hub for developing next-generation renewable energies. It aims to make the EU the world leader in the sector through preparing markets and grids for a growing proportion of renewable energy, and investing in advanced, sustainable alternative fuels.'
- Topic 3: electricity, transport, vehicle -- do wyrzucenia (0% tokenów)
- Topic 4: transport, promote, support, public, system, vehicle, fuel, tax, mobility, encourage
- Topic 5: Act, system, funding, expansion (grid expansion)
- Topic 6: heat, waste, gas, district heating, biomass



In [None]:
from matplotlib import colors
topics = decarbonisation_models[4].show_topics(formatted=False)
counter = Counter(decarbonisation_docs.sum())

out = []
for i, topic in topics:
    for word, weight in topic:
        word = decarbonisation_dictionary[int(word)]
        out.append([word, i , weight, counter[word]])

df = pd.DataFrame(out, columns=['word', 'topic_id', 'importance', 'word_count'])        

fig, axes = plt.subplots(3, 2, figsize=(14,14), sharey=True)
cols = [color for name, color in colors.TABLEAU_COLORS.items()]
for i, ax in enumerate(axes.flatten()):
    if i>=3:
      i+=1
    ax.bar(x='word', height="word_count", data=df.loc[df.topic_id==i, :], color=cols[i], width=0.5, alpha=0.3, label='Word Count')
    ax_twin = ax.twinx()
    ax_twin.bar(x='word', height="importance", data=df.loc[df.topic_id==i, :], color=cols[i], width=0.2, label='Weights')
    ax.set_ylabel('Word Count', color=cols[i])
    ax_twin.set_ylim(0, 0.018); ax.set_ylim(0, 3000)
    ax.set_title('Topic: ' + str(i), color=cols[i], fontsize=12)
    ax.tick_params(axis='y', left=False)
    ax.set_xticklabels(df.loc[df.topic_id==i, 'word'], rotation=30, horizontalalignment= 'right')
    ax.legend(loc='upper left'); ax_twin.legend(loc='upper right')
    ax.grid(False)
    ax_twin.grid(False)
fig.suptitle('Topics for dimension: Decarbonisation', fontsize=16)    
fig.tight_layout()    
plt.show()

In [None]:
decarbonisation_corpus_model = decarbonisation_models[4][decarbonisation_encoded_docs]

In [None]:
decarbonisation_metainfo = necp_processed[(necp_processed['energy_union_dimension'] == "Decarbonisation")]
res_len = len(decarbonisation_metainfo)
res = np.zeros((res_len, 7))

In [None]:
for i, doc in enumerate(decarbonisation_corpus_model):
  for topic in doc:
    res[i][topic[0]] = np.round(topic[1], 4)

In [None]:
decarbonisation_modeling_results = pd.concat([decarbonisation_metainfo.reset_index(drop=True), pd.DataFrame(res)], axis=1)
decarbonisation_topic_probs = decarbonisation_modeling_results.groupby("country").mean().loc[:,[0, 1, 2, 4, 5, 6]]

In [None]:
decarbonisation_modeling_results.groupby("subsection").mean().loc[:,[0, 1, 2, 4, 5, 6]]

In [None]:
decarbonisation_topic_probs

In [None]:
import scipy.spatial as sp
import scipy.cluster.hierarchy as hc
linkage = hc.linkage(decarbonisation_topic_probs, method='average', metric='cosine')
decarbonisation_similarities = sp.distance.squareform(sp.distance.pdist(decarbonisation_topic_probs.values, metric='cosine'))

In [None]:
plt.figure(figsize=(12, 8))
sns.clustermap(1-decarbonisation_similarities, 
            xticklabels=decarbonisation_topic_probs.index, 
            yticklabels=decarbonisation_topic_probs.index,
             row_linkage=linkage, col_linkage=linkage)
plt.show()

In [None]:
decarbonisation_comparison = decarbonisation_modeling_results.groupby(["country", "subsection"]).mean().loc[:,0:6]

In [None]:
countries = decarbonisation_modeling_results.country.unique()
sections = ["Policies and Measures", "National Objectives and Targets"]

In [None]:
decarbonisation_change = {"country": [], "similarity": []}
for country in countries:
  pm = decarbonisation_modeling_results.loc[(decarbonisation_modeling_results["country"] == country) &
                                        (decarbonisation_modeling_results["subsection"] == sections[0])].loc[:,0:6]
  noat = decarbonisation_modeling_results.loc[(decarbonisation_modeling_results["country"] == country) & 
                                        (decarbonisation_modeling_results["subsection"] == sections[1])].loc[:,0:6]
  if pm.shape[0]==1:
    decarbonisation_change["country"].append(country) 
    decarbonisation_change["similarity"].append(1-sp.distance.cosine(pm, noat))
pd.DataFrame(decarbonisation_change)

### **Dimension:** Energy efficiency  

In [None]:
energy_efficiency_docs = necp_processed[(necp_processed['energy_union_dimension'] == "Energy efficiency")]["necp_lemmas"]
energy_efficiency_counter = Counter(energy_efficiency_docs.sum()).most_common(30)
plot_counter(energy_efficiency_counter)
plt.show()

In [None]:
energy_efficiency_docs = energy_efficiency_docs.apply(lambda doc: [lemma for lemma in doc if not (lemma in ['building', 'efficiency', 'consumption'])])

In [None]:
energy_efficiency_dictionary = Dictionary(energy_efficiency_docs)
energy_efficiency_dictionary.filter_extremes(no_below=2, no_above=1.0)
energy_efficiency_encoded_docs = energy_efficiency_docs.apply(energy_efficiency_dictionary.doc2bow)

In [None]:
energy_efficiency_models = []
for topics_number in tqdm(range(3, 13)):
    lda = LdaMulticore(energy_efficiency_encoded_docs, num_topics=topics_number, passes=8, iterations=100, random_state=123)
    energy_efficiency_models.append(lda)

In [None]:
energy_efficiency_cvs = []
for model in tqdm(energy_efficiency_models):
    cm = CoherenceModel(model,texts=energy_efficiency_docs, dictionary=energy_efficiency_dictionary)
    c_v = cm.get_coherence()
    energy_efficiency_cvs.append(c_v)

In [None]:
px.line(x=range(3, 13), y=energy_efficiency_cvs)

In [None]:
vis = pyLDAvis.gensim_models.prepare(energy_efficiency_models[8], energy_efficiency_encoded_docs, dictionary=energy_efficiency_dictionary)
vis

In [None]:
for idx, topic in energy_efficiency_models[8].show_topics(formatted=False, num_words=15, num_topics=11):
    print('Topic: {} \nWords: {}'.format(idx, [energy_efficiency_dictionary[int(w[0])] for w in topic]))

- Topic 0: final, final_consumption, primary, saving
- Topic 1: Directive, saving, renovation, strategy, long
- Topic 2: PPM, PPM scenario, power generation, scenario, final consumption
- Topic 3: heating, heat, potential, final, requirement
- Topic 4: public, saving, final -- do wyrzucenia (0% tokenów)
- Topic 5: ktoe (kilotonne of oil equivalent), scenario, fuel, baseline
- Topic 6: K, W, maximum, EEOS (Energy Efficiency Obligation Scheme)
- Topic 7: TWh, coal, gas, oil, Consommation
- Topic 8: public, implementation, system, instrument
- Topic 9: public, vehicle, autonomous, autonomous community
- Topic 10: renovation, support, work, founding



In [None]:
from matplotlib import colors
topics = energy_efficiency_models[8].show_topics(formatted=False, num_topics=11)
counter = Counter(energy_efficiency_docs.sum())

out = []
for i, topic in topics:
    for word, weight in topic:
        word = energy_efficiency_dictionary[int(word)]
        out.append([word, i , weight, counter[word]])

df = pd.DataFrame(out, columns=['word', 'topic_id', 'importance', 'word_count'])        

fig, axes = plt.subplots(5, 2, figsize=(14, 20), sharey=True)
cols = [color for name, color in colors.TABLEAU_COLORS.items()]
cols.append(cols[4])

for i, ax in enumerate(axes.flatten()):
    if i>=4:
      i+=1
    ax.bar(x='word', height="word_count", data=df.loc[df.topic_id==i, :], color=cols[i], width=0.5, alpha=0.3, label='Word Count')
    ax_twin = ax.twinx()
    ax_twin.bar(x='word', height="importance", data=df.loc[df.topic_id==i, :], color=cols[i], width=0.2, label='Weights')
    ax.set_ylabel('Word Count', color=cols[i])
    ax_twin.set_ylim(0, 0.045);
    ax.set_ylim(0, 2500)
    ax.set_title('Topic: ' + str(i), color=cols[i], fontsize=12)
    ax.tick_params(axis='y', left=False)
    ax.set_xticklabels(df.loc[df.topic_id==i, 'word'], rotation=30, horizontalalignment= 'right')
    ax.legend(loc='upper left'); ax_twin.legend(loc='upper right')
    ax.grid(False)
    ax_twin.grid(False)
fig.suptitle('Topics for dimension: Energy efficiency', fontsize=16)    
fig.tight_layout()    
plt.show()

In [None]:
energy_efficiency_corpus_model = energy_efficiency_models[8][energy_efficiency_encoded_docs]

In [None]:
energy_efficiency_metainfo = necp_processed[(necp_processed['energy_union_dimension'] == "Energy efficiency")]
res_len = len(energy_efficiency_metainfo)
res = np.zeros((res_len, 11))

In [None]:
for i, doc in enumerate(energy_efficiency_corpus_model):
  for topic in doc:
    res[i][topic[0]] = np.round(topic[1], 4)

In [None]:
energy_efficiency_modeling_results = pd.concat([energy_efficiency_metainfo.reset_index(drop=True), pd.DataFrame(res)], axis=1)
energy_efficiency_topic_probs = energy_efficiency_modeling_results.groupby("country").mean().loc[:,[0, 1, 2, 4, 5, 6, 7, 8, 9, 10]]

In [None]:
energy_efficiency_topic_probs

In [None]:
import scipy.spatial as sp
import scipy.cluster.hierarchy as hc
linkage = hc.linkage(energy_efficiency_topic_probs, method='average', metric='cosine')
energy_efficiency_similarities = sp.distance.squareform(sp.distance.pdist(energy_efficiency_topic_probs.values, metric='cosine'))

In [None]:
plt.figure(figsize=(12, 8))
sns.clustermap(1-energy_efficiency_similarities, 
            xticklabels=energy_efficiency_topic_probs.index, 
            yticklabels=energy_efficiency_topic_probs.index,
             row_linkage=linkage, col_linkage=linkage)
plt.show()

In [None]:
energy_efficiency_comparison = energy_efficiency_modeling_results.groupby(["country", "subsection"]).mean().loc[:,0:10]

In [None]:
countries = energy_efficiency_modeling_results.country.unique()
sections = ["Policies and Measures", "National Objectives and Targets"]

In [None]:
energy_efficiency_change = {"country": [], "similarity": []}
for country in countries:
  pm = energy_efficiency_modeling_results.loc[(energy_efficiency_modeling_results["country"] == country) &
                                        (energy_efficiency_modeling_results["subsection"] == sections[0])].loc[:,0:10]
  noat = energy_efficiency_modeling_results.loc[(energy_efficiency_modeling_results["country"] == country) & 
                                        (energy_efficiency_modeling_results["subsection"] == sections[1])].loc[:,0:10]
  if pm.shape[0]==1:
    energy_efficiency_change["country"].append(country) 
    energy_efficiency_change["similarity"].append(1-sp.distance.cosine(pm, noat))
pd.DataFrame(energy_efficiency_change)

### **Dimension**: Energy security

In [None]:
energy_security_docs = necp_processed[(necp_processed['energy_union_dimension'] == "Energy security")]["necp_lemmas"]
energy_security_counter = Counter(energy_security_docs.sum()).most_common(30)
plot_counter(energy_security_counter)
plt.show()

In [None]:
energy_security_docs = energy_security_docs.apply(lambda doc: [lemma for lemma in doc if not (lemma in ['gas', 'supply', 'electricity', 'system', 'security'])])

In [None]:
energy_security_dictionary = Dictionary(energy_security_docs)
energy_security_dictionary.filter_extremes(no_below=2, no_above=1.0)
energy_security_encoded_docs = energy_security_docs.apply(energy_security_dictionary.doc2bow)

In [None]:
energy_security_models = []
for topics_number in tqdm(range(3, 13)):
    lda = LdaMulticore(energy_security_encoded_docs, num_topics=topics_number, passes=8, iterations=100, random_state=123)
    energy_security_models.append(lda)

In [None]:
energy_security_cvs = []
for model in tqdm(energy_security_models):
    cm = CoherenceModel(model,texts=energy_security_docs, dictionary=energy_security_dictionary)
    c_v = cm.get_coherence()
    energy_security_cvs.append(c_v)

In [None]:
px.line(x=range(3, 13), y=energy_security_cvs)

In [None]:
vis = pyLDAvis.gensim_models.prepare(energy_security_models[7], energy_security_encoded_docs, dictionary=energy_security_dictionary)
vis

In [None]:
for idx, topic in energy_security_models[7].show_topics(formatted=False, num_words=15):
    print('Topic: {} \nWords: {}'.format(idx, [energy_security_dictionary[int(w[0])] for w in topic]))

- Topic 0: wood, forest, crop, power, natural, natural gas, nuclear, demand
- Topic 1: Gas, source, import -- do wyrzucenia (0% tokenów)
- Topic 2: natural_gas, natural, field, oil
- Topic 3: fuel, import, production, share
- Topic 4: increase, renewable, oil, demand
- Topic 5: storage, capacity, emergency
- Topic 6: increase, oil, heating, Agreement
- Topic 7: import, natural, renewable, consumption
- Topic 8: risk, ensure, regional
- Topic 9: Act, Regulation, Security Act



In [None]:
from matplotlib import colors
topics = energy_security_models[7].show_topics(formatted=False)
counter = Counter(energy_security_docs.sum())

out = []
for i, topic in topics:
    for word, weight in topic:
        word = energy_security_dictionary[int(word)]
        out.append([word, i , weight, counter[word]])

df = pd.DataFrame(out, columns=['word', 'topic_id', 'importance', 'word_count'])        

fig, axes = plt.subplots(3, 3, figsize=(21,12), sharey=True)
cols = [color for name, color in colors.TABLEAU_COLORS.items()]
for i, ax in enumerate(axes.flatten()):
    if i>=1:
      i+=1
    ax.bar(x='word', height="word_count", data=df.loc[df.topic_id==i, :], color=cols[i], width=0.5, alpha=0.3, label='Word Count')
    ax_twin = ax.twinx()
    ax_twin.bar(x='word', height="importance", data=df.loc[df.topic_id==i, :], color=cols[i], width=0.2, label='Weights')
    ax.set_ylabel('Word Count', color=cols[i])
    ax_twin.set_ylim(0, 0.025);
    ax.set_ylim(0, 2500)
    ax.set_title('Topic: ' + str(i), color=cols[i], fontsize=12)
    ax.tick_params(axis='y', left=False)
    ax.set_xticklabels(df.loc[df.topic_id==i, 'word'], rotation=30, horizontalalignment= 'right')
    ax.legend(loc='upper left'); ax_twin.legend(loc='upper right')
    ax.grid(False)
    ax_twin.grid(False)
fig.suptitle('Topics for dimension: Energy Security', fontsize=16)    
fig.tight_layout()    
plt.show()

In [None]:
energy_security_corpus_model = energy_security_models[7][energy_security_encoded_docs]

In [None]:
energy_security_metainfo = necp_processed[(necp_processed['energy_union_dimension'] == "Energy security")]
res_len = len(energy_security_metainfo)
res = np.zeros((res_len, 10))

In [None]:
for i, doc in enumerate(energy_security_corpus_model):
  for topic in doc:
    res[i][topic[0]] = np.round(topic[1], 4)

In [None]:
energy_security_modeling_results = pd.concat([energy_security_metainfo.reset_index(drop=True), pd.DataFrame(res)], axis=1)
energy_security_topic_probs = energy_security_modeling_results.groupby("country").mean().loc[:,[0, 1, 2, 4, 5, 6, 7, 8, 9]]

In [None]:
energy_security_topic_probs

In [None]:
linkage = hc.linkage(energy_security_topic_probs, method='average', metric='cosine')
energy_security_similarities = sp.distance.squareform(sp.distance.pdist(energy_security_topic_probs.values, metric='cosine'))

In [None]:
plt.figure(figsize=(12, 8))
sns.clustermap(1-energy_security_similarities, 
            xticklabels=energy_security_topic_probs.index, 
            yticklabels=energy_security_topic_probs.index,
             row_linkage=linkage, col_linkage=linkage)
plt.show()

In [None]:
energy_security_comparison = energy_security_modeling_results.groupby(["country", "subsection"]).mean().loc[:,0:9]

In [None]:
countries = energy_security_modeling_results.country.unique()
sections = ["Policies and Measures", "National Objectives and Targets"]

In [None]:
energy_security_change = {"country": [], "similarity": []}
for country in countries:
  pm = energy_security_modeling_results.loc[(energy_security_modeling_results["country"] == country) &
                                        (energy_security_modeling_results["subsection"] == sections[0])].loc[:,0:9]
  noat = energy_security_modeling_results.loc[(energy_security_modeling_results["country"] == country) & 
                                        (energy_security_modeling_results["subsection"] == sections[1])].loc[:,0:9]
  if pm.shape[0]==1:
    energy_security_change["country"].append(country) 
    energy_security_change["similarity"].append(1-sp.distance.cosine(pm, noat))
pd.DataFrame(energy_security_change)

### **Dimension**: Internal market

In [None]:
internal_market_docs = necp_processed[(necp_processed['energy_union_dimension'] == "Internal market")]["necp_lemmas"]
internal_market_counter = Counter(internal_market_docs.sum()).most_common(30)
plot_counter(internal_market_counter)
plt.show()

In [None]:
internal_market_docs = internal_market_docs.apply(lambda doc: [lemma for lemma in doc if not (lemma in ['electricity', 'market', 'gas', 'system'])])

In [None]:
internal_market_dictionary = Dictionary(internal_market_docs)
internal_market_dictionary.filter_extremes(no_below=2, no_above=1.0)
internal_market_encoded_docs = internal_market_docs.apply(internal_market_dictionary.doc2bow)

In [None]:
internal_market_models = []
for topics_number in tqdm(range(3, 13)):
    lda = LdaMulticore(internal_market_encoded_docs, num_topics=topics_number, passes=8, iterations=100, random_state=123)
    internal_market_models.append(lda)

In [None]:
internal_market_cvs = []
for model in tqdm(internal_market_models):
    cm = CoherenceModel(model,texts=internal_market_docs, dictionary=internal_market_dictionary)
    c_v = cm.get_coherence()
    internal_market_cvs.append(c_v)

In [None]:
px.line(x=range(3, 13), y=internal_market_cvs)

In [None]:
vis = pyLDAvis.gensim_models.prepare(internal_market_models[2], internal_market_encoded_docs, dictionary=internal_market_dictionary)
vis

In [None]:
for idx, topic in internal_market_models[2].show_topics(formatted=False, num_words=15):
    print('Topic: {} \nWords: {}'.format(idx, [internal_market_dictionary[int(w[0])] for w in topic]))

- Topic 0: Act, grid, grid_expansion, Network Agency
- Topic 1: development, new, renewable
- Topic 2: band, consumption_band -- do wyrzucenia (0% tokenów)
- Topic 3: transmission, network, interconnection
- Topic 4: neutral, neutral gas, emission



In [None]:
from matplotlib import colors
topics = internal_market_models[2].show_topics(formatted=False)
counter = Counter(internal_market_docs.sum())

out = []
for i, topic in topics:
    for word, weight in topic:
        word = internal_market_dictionary[int(word)]
        out.append([word, i , weight, counter[word]])

df = pd.DataFrame(out, columns=['word', 'topic_id', 'importance', 'word_count'])        

fig, axes = plt.subplots(2, 2, figsize=(14,8), sharey=True)
cols = [color for name, color in colors.TABLEAU_COLORS.items()]
for i, ax in enumerate(axes.flatten()):
    if i>=2:
      i+=1
    ax.bar(x='word', height="word_count", data=df.loc[df.topic_id==i, :], color=cols[i], width=0.5, alpha=0.3, label='Word Count')
    ax_twin = ax.twinx()
    ax_twin.bar(x='word', height="importance", data=df.loc[df.topic_id==i, :], color=cols[i], width=0.2, label='Weights')
    ax.set_ylabel('Word Count', color=cols[i])
    ax_twin.set_ylim(0, 0.02);
    ax.set_ylim(0, 2500)
    ax.set_title('Topic: ' + str(i), color=cols[i], fontsize=12)
    ax.tick_params(axis='y', left=False)
    ax.set_xticklabels(df.loc[df.topic_id==i, 'word'], rotation=30, horizontalalignment= 'right')
    ax.legend(loc='upper left'); ax_twin.legend(loc='upper right')
    ax.grid(False)
    ax_twin.grid(False)
fig.suptitle('Topics for dimension: Internal market', fontsize=16)    
fig.tight_layout()    
plt.show()

In [None]:
internal_market_corpus_model = internal_market_models[2][internal_market_encoded_docs]

In [None]:
internal_market_metainfo = necp_processed[(necp_processed['energy_union_dimension'] == "Internal market")]
res_len = len(internal_market_metainfo)
res = np.zeros((res_len, 5))

In [None]:
for i, doc in enumerate(internal_market_corpus_model):
  for topic in doc:
    res[i][topic[0]] = np.round(topic[1], 4)

In [None]:
internal_market_modeling_results = pd.concat([internal_market_metainfo.reset_index(drop=True), pd.DataFrame(res)], axis=1)
internal_market_topic_probs = internal_market_modeling_results.groupby("country").mean().loc[:,[0, 1, 2, 4]]

In [None]:
internal_market_topic_probs

In [None]:
linkage = hc.linkage(internal_market_topic_probs, method='average', metric='cosine')
internal_market_similarities = sp.distance.squareform(sp.distance.pdist(internal_market_topic_probs.values, metric='cosine'))

In [None]:
plt.figure(figsize=(12, 8))
sns.clustermap(1-internal_market_similarities, 
            xticklabels=internal_market_topic_probs.index, 
            yticklabels=internal_market_topic_probs.index,
             row_linkage=linkage, col_linkage=linkage)
plt.show()

In [None]:
internal_market_comparison = internal_market_modeling_results.groupby(["country", "subsection"]).mean().loc[:,0:4]

In [None]:
countries = internal_market_modeling_results.country.unique()
sections = ["Policies and Measures", "National Objectives and Targets"]

In [None]:
internal_market_change = {"country": [], "similarity": []}
for country in countries:
  pm = internal_market_modeling_results.loc[(internal_market_modeling_results["country"] == country) &
                                        (internal_market_modeling_results["subsection"] == sections[0])].loc[:,0:4]
  noat = internal_market_modeling_results.loc[(internal_market_modeling_results["country"] == country) & 
                                        (internal_market_modeling_results["subsection"] == sections[1])].loc[:,0:4]
  if pm.shape[0]==1:
    internal_market_change["country"].append(country) 
    internal_market_change["similarity"].append(1-sp.distance.cosine(pm, noat))
pd.DataFrame(internal_market_change)

### **Dimension:** R&I and Competitiveness

In [None]:
research_docs = necp_processed[(necp_processed['energy_union_dimension'] == "R&I and Competitiveness")]["necp_lemmas"]
research_counter = Counter(research_docs.sum()).most_common(30)
plot_counter(research_counter)
plt.show()

In [None]:
research_docs = research_docs.apply(lambda doc: [lemma for lemma in doc if not (lemma in ['research'])])

In [None]:
research_dictionary = Dictionary(research_docs)
research_dictionary.filter_extremes(no_below=2, no_above=1.0)
research_encoded_docs = research_docs.apply(research_dictionary.doc2bow)

In [None]:
research_models = []
for topics_number in tqdm(range(3, 13)):
    lda = LdaMulticore(research_encoded_docs, num_topics=topics_number, passes=10, iterations=80, random_state=42)
    research_models.append(lda)

In [None]:
research_cvs = []
for model in tqdm(research_models):
    cm = CoherenceModel(model,texts=research_docs, dictionary=research_dictionary)
    c_v = cm.get_coherence()
    research_cvs.append(c_v)

In [None]:
px.line(x=range(3, 13), y=research_cvs)

In [None]:
vis = pyLDAvis.gensim_models.prepare(research_models[1], research_encoded_docs, dictionary=research_dictionary)
vis

In [None]:
for idx, topic in research_models[1].show_topics(formatted=False, num_words=15):
    print('Topic: {} \nWords: {}'.format(idx, [research_dictionary[int(w[0])] for w in topic]))

- Topic 0: price, electricity, gas, fuel, source, tax, subsidy, expenditure, household 
- Topic 1: innovation, technology, power, nuclear power, water, renewable, emission
- Topic 2: research innovation, technology, development, programme, cooperation, project, competitiveness
- Topic 3: duty, Act, excise, law, charge

In [None]:
from matplotlib import colors
topics = research_models[1].show_topics(formatted=False)
counter = Counter(research_docs.sum())

out = []
for i, topic in topics:
    for word, weight in topic:
        word = research_dictionary[int(word)]
        out.append([word, i , weight, counter[word]])

df = pd.DataFrame(out, columns=['word', 'topic_id', 'importance', 'word_count'])        

fig, axes = plt.subplots(2, 2, figsize=(14,8), sharey=True)
cols = [color for name, color in colors.TABLEAU_COLORS.items()]
for i, ax in enumerate(axes.flatten()):
    ax.bar(x='word', height="word_count", data=df.loc[df.topic_id==i, :], color=cols[i], width=0.5, alpha=0.3, label='Word Count')
    ax_twin = ax.twinx()
    ax_twin.bar(x='word', height="importance", data=df.loc[df.topic_id==i, :], color=cols[i], width=0.2, label='Weights')
    ax.set_ylabel('Word Count', color=cols[i])
    ax_twin.set_ylim(0, 0.02); ax.set_ylim(0, 2500)
    ax.set_title('Topic: ' + str(i), color=cols[i], fontsize=12)
    ax.tick_params(axis='y', left=False)
    ax.set_xticklabels(df.loc[df.topic_id==i, 'word'], rotation=30, horizontalalignment= 'right')
    ax.legend(loc='upper left'); ax_twin.legend(loc='upper right')
    ax.grid(False)
    ax_twin.grid(False)
fig.suptitle('Topics for dimension: R&I and Competitiveness', fontsize=16)    
fig.tight_layout()    
plt.show()

In [None]:
research_corpus_model = research_models[1][research_encoded_docs]

In [None]:
research_metainfo = necp_processed[(necp_processed['energy_union_dimension'] == "R&I and Competitiveness")]
res_len = len(research_metainfo)
res = np.zeros((res_len, 4))

In [None]:
for i, doc in enumerate(research_corpus_model):
  for topic in doc:
    res[i][topic[0]] = np.round(topic[1], 4)

In [None]:
research_modeling_results = pd.concat([research_metainfo.reset_index(drop=True), pd.DataFrame(res)], axis=1)
research_topic_probs = research_modeling_results.groupby("country").mean().loc[:,[0, 1, 2, 3]]

In [None]:
research_topic_probs

In [None]:
import scipy.spatial as sp
import scipy.cluster.hierarchy as hc
linkage = hc.linkage(research_topic_probs, method='average', metric='cosine')
research_similarities = sp.distance.squareform(sp.distance.pdist(research_topic_probs.values, metric='cosine'))

In [None]:
plt.figure(figsize=(12, 8))
sns.clustermap(1-research_similarities, 
            xticklabels=research_topic_probs.index, 
            yticklabels=research_topic_probs.index,
             row_linkage=linkage, col_linkage=linkage)
plt.show()

In [None]:
research_comparison = research_modeling_results.groupby(["country", "subsection"]).mean().loc[:,0:3]

In [None]:
countries = research_modeling_results.country.unique()
sections = ["Policies and Measures", "National Objectives and Targets"]

In [None]:
research_change = {"country": [], "similarity": []}
for country in countries:
  pm = research_modeling_results.loc[(research_modeling_results["country"] == country) &
                                        (research_modeling_results["subsection"] == sections[0])].loc[:,0:3]
  noat = research_modeling_results.loc[(research_modeling_results["country"] == country) & 
                                        (research_modeling_results["subsection"] == sections[1])].loc[:,0:3]
  if pm.shape[0]==1:
    research_change["country"].append(country) 
    research_change["similarity"].append(1-sp.distance.cosine(pm, noat))
pd.DataFrame(research_change)

### **subsection**: Overview and Process for Establishing the Plan

In [None]:
overview_docs = necp_processed[(necp_processed['subsection'] == "Overview and Process for Establishing the Plan")]["necp_lemmas"]
overview_counter = Counter(overview_docs.sum()).most_common(30)
plot_counter(overview_counter)
plt.show()

In [None]:
overview_docs = overview_docs.apply(lambda doc: [lemma for lemma in doc if not (lemma in ['emission', 'renewable'])])

In [None]:
overview_dictionary = Dictionary(overview_docs)
overview_dictionary.filter_extremes(no_below=2, no_above=1.0)
overview_encoded_docs = overview_docs.apply(overview_dictionary.doc2bow)

In [None]:
overview_models = []
for topics_number in tqdm(range(3, 13)):
    lda = LdaMulticore(overview_encoded_docs, num_topics=topics_number, passes=8, iterations=100, random_state=123)
    overview_models.append(lda)

In [None]:
overview_cvs = []
for model in tqdm(overview_models):
    cm = CoherenceModel(model,texts=overview_docs, dictionary=overview_dictionary)
    c_v = cm.get_coherence()
    overview_cvs.append(c_v)

In [None]:
px.line(x=range(3, 13), y=overview_cvs)

In [None]:
vis = pyLDAvis.gensim_models.prepare(overview_models[0], overview_encoded_docs, dictionary=overview_dictionary)
vis

In [None]:
for idx, topic in overview_models[0].show_topics(formatted=False, num_words=15):
    print('Topic: {} \nWords: {}'.format(idx, [overview_dictionary[int(w[0])] for w in topic]))

- Topic 0: electricity, emission, renewable, system, country, regional
- Topic 1: project, wind, cooperation, nordic
- Topic 2: gas, market, efficiency, source, public, transport



In [None]:
overview_corpus_model = overview_models[0][overview_encoded_docs]

In [None]:
overview_metainfo = necp_processed[(necp_processed['subsection'] == "Overview and Process for Establishing the Plan")]
res_len = len(overview_metainfo)
res = np.zeros((res_len, 3))

In [None]:
for i, doc in enumerate(overview_corpus_model):
  for topic in doc:
    res[i][topic[0]] = np.round(topic[1], 4)

In [None]:
overview_modeling_results = pd.concat([overview_metainfo.reset_index(drop=True), pd.DataFrame(res)], axis=1)
overview_topic_probs = overview_modeling_results.groupby("country").mean().loc[:,[0, 1, 2]]

In [None]:
overview_topic_probs

In [None]:
import scipy.spatial as sp
import scipy.cluster.hierarchy as hc
linkage = hc.linkage(overview_topic_probs, method='average', metric='cosine')
overview_similarities = sp.distance.squareform(sp.distance.pdist(overview_topic_probs.values, metric='cosine'))

In [None]:
plt.figure(figsize=(12, 8))
sns.clustermap(1-overview_similarities, 
            xticklabels=overview_topic_probs.index, 
            yticklabels=overview_topic_probs.index,
             row_linkage=linkage, col_linkage=linkage)
plt.show()

In [None]:
from matplotlib import colors
topics = overview_models[0].show_topics(formatted=False)
counter = Counter(overview_docs.sum())

out = []
for i, topic in topics:
    for word, weight in topic:
        word = overview_dictionary[int(word)]
        out.append([word, i , weight, counter[word]])

df = pd.DataFrame(out, columns=['word', 'topic_id', 'importance', 'word_count'])        

fig, axes = plt.subplots(1, 3, figsize=(21, 4), sharey=True)
cols = [color for name, color in colors.TABLEAU_COLORS.items()]

for i, ax in enumerate(axes.flatten()):
    ax.bar(x='word', height="word_count", data=df.loc[df.topic_id==i, :], color=cols[i], width=0.5, alpha=0.3, label='Word Count')
    ax_twin = ax.twinx()
    ax_twin.bar(x='word', height="importance", data=df.loc[df.topic_id==i, :], color=cols[i], width=0.2, label='Weights')
    ax.set_ylabel('Word Count', color=cols[i])
    ax_twin.set_ylim(0, 0.0125); ax.set_ylim(0, 1500)
    ax.set_title('Topic: ' + str(i), color=cols[i], fontsize=12)
    ax.tick_params(axis='y', left=False)
    ax.set_xticklabels(df.loc[df.topic_id==i, 'word'], rotation=30, horizontalalignment= 'right')
    ax.legend(loc='upper left'); ax_twin.legend(loc='upper right')
    ax.grid(False)
    ax_twin.grid(False)
fig.suptitle('Topics for Overview and Process for Establishing the Plan', fontsize=16)    
fig.tight_layout()    
plt.show()

### Overview and Process for Establishing the Plan with the removal of most common lemmas

In [None]:
overview_docs = overview_docs.apply(lambda doc: [lemma for lemma in doc if not (lemma in ['electricity', 'gas', 'renewable', 'emission'])])

In [None]:
overview_dictionary = Dictionary(overview_docs)
overview_dictionary.filter_extremes(no_below=2, no_above=1.0)
overview_encoded_docs = overview_docs.apply(overview_dictionary.doc2bow)

In [None]:
overview_models = []
for topics_number in tqdm(range(3, 13)):
    lda = LdaMulticore(overview_encoded_docs, num_topics=topics_number, passes=8, iterations=100, random_state=123)
    overview_models.append(lda)

In [None]:
overview_cvs = []
for model in tqdm(overview_models):
    cm = CoherenceModel(model,texts=overview_docs, dictionary=overview_dictionary)
    c_v = cm.get_coherence()
    overview_cvs.append(c_v)

In [None]:
px.line(x=range(3, 13), y=overview_cvs)

In [None]:
vis = pyLDAvis.gensim_models.prepare(overview_models[6], overview_encoded_docs, dictionary=overview_dictionary)
vis

In [None]:
for idx, topic in overview_models[6].show_topics(formatted=False, num_words=15):
    print('Topic: {} \nWords: {}'.format(idx, [overview_dictionary[int(w[0])] for w in topic]))

- Topic 0: Region, offshore, work, wind, seas
- Topic 1: cooperation, market, building, nordic, north, wind
- Topic 2: 0%
- Topic 3: INECP, RES, development, market, system, EE
- Topic 4: efficiency, increase, Resolution, approve
- Topic 5: Baltic, development, officials
- Topic 6: document, level, strategic, source
- Topic 7: 0%
- Topic 8: supply, increase, tax, province



In [None]:
overview_corpus_model = overview_models[6][overview_encoded_docs]

In [None]:
overview_metainfo = necp_processed[(necp_processed['subsection'] == "Overview and Process for Establishing the Plan")]
res_len = len(overview_metainfo)
res = np.zeros((res_len, 10))

In [None]:
for i, doc in enumerate(overview_corpus_model):
  for topic in doc:
    res[i][topic[0]] = np.round(topic[1], 4)

In [None]:
overview_modeling_results = pd.concat([overview_metainfo.reset_index(drop=True), pd.DataFrame(res)], axis=1)
overview_topic_probs = overview_modeling_results.groupby("country").mean().loc[:,[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]]

In [None]:
overview_topic_probs

In [None]:
import scipy.spatial as sp
import scipy.cluster.hierarchy as hc
linkage = hc.linkage(overview_topic_probs, method='average', metric='cosine')
overview_similarities = sp.distance.squareform(sp.distance.pdist(overview_topic_probs.values, metric='cosine'))

In [None]:
plt.figure(figsize=(12, 8))
sns.clustermap(1-overview_similarities, 
            xticklabels=overview_topic_probs.index, 
            yticklabels=overview_topic_probs.index,
             row_linkage=linkage, col_linkage=linkage)
plt.show()

In [None]:
necp_processed

### **subsection**: Impact Assessment of Planned Policies and Measures

In [None]:
impact_docs = necp_processed[(necp_processed['subsection'] == "Impact Assessment of Planned Policies and Measures")]["necp_lemmas"]
impact_counter = Counter(impact_docs.sum()).most_common(30)
plot_counter(impact_counter)
plt.show()

In [None]:
impact_docs = impact_docs.apply(lambda doc: [lemma for lemma in doc if not (lemma in ['emission', 'scenario'])])

In [None]:
impact_dictionary = Dictionary(impact_docs)
impact_dictionary.filter_extremes(no_below=2, no_above=1.0)
impact_encoded_docs = impact_docs.apply(impact_dictionary.doc2bow)

In [None]:
impact_models = []
for topics_number in tqdm(range(3, 13)):
    lda = LdaMulticore(impact_encoded_docs, num_topics=topics_number, passes=8, iterations=100, random_state=123)
    impact_models.append(lda)

In [None]:
impact_cvs = []
for model in tqdm(impact_models):
    cm = CoherenceModel(model,texts=impact_docs, dictionary=impact_dictionary)
    c_v = cm.get_coherence()
    impact_cvs.append(c_v)

In [None]:
px.line(x=range(3, 13), y=impact_cvs)

In [None]:
vis = pyLDAvis.gensim_models.prepare(impact_models[6], impact_encoded_docs, dictionary=impact_dictionary)
vis

In [None]:
for idx, topic in impact_models[6].show_topics(formatted=False, num_words=15):
    print('Topic: {} \nWords: {}'.format(idx, [impact_dictionary[int(w[0])] for w in topic]))

- Topic 0: transport, renewable, consumption, GHG (Greenhouse Gases), biofuel
- Topic 1: investment, efficiency, increase, impact, financing
- Topic 2: project, expect, electricity, heat pump, pam
- Topic 3: impact, reduce, increase, positive
- Topic 4: investment, electricity, increase, WAM, gas, programme
- Topic 5: consumption, increase, term, carbon, INECP (International Nonproliferation Export Control Program)
- Topic 6: 0%
- Topic 7: fuel, impact, source, REF (
Renewable Energy Foundation), plant, Annex
- Topic 8: cost, PPM, WEM, investment



In [None]:
impact_corpus_model = impact_models[6][impact_encoded_docs]

In [None]:
impact_metainfo = necp_processed[(necp_processed['subsection'] == "Impact Assessment of Planned Policies and Measures")]
res_len = len(impact_metainfo)
res = np.zeros((res_len, 9))

In [None]:
for i, doc in enumerate(impact_corpus_model):
  for topic in doc:
    res[i][topic[0]] = np.round(topic[1], 4)

In [None]:
impact_modeling_results = pd.concat([impact_metainfo.reset_index(drop=True), pd.DataFrame(res)], axis=1)
impact_topic_probs = impact_modeling_results.groupby("country").mean().loc[:,[0, 1, 2, 3, 4, 5, 6, 7, 8]]

In [None]:
impact_topic_probs

In [None]:
import scipy.spatial as sp
import scipy.cluster.hierarchy as hc
linkage = hc.linkage(overview_topic_probs, method='average', metric='cosine')
impact_similarities = sp.distance.squareform(sp.distance.pdist(overview_topic_probs.values, metric='cosine'))

In [None]:
plt.figure(figsize=(12, 8))
sns.clustermap(1-impact_similarities, 
            xticklabels=overview_topic_probs.index, 
            yticklabels=overview_topic_probs.index,
             row_linkage=linkage, col_linkage=linkage)
plt.show()

In [None]:
from matplotlib import colors
topics = impact_models[6].show_topics(formatted=False)
counter = Counter(impact_docs.sum())

out = []
for i, topic in topics:
    for word, weight in topic:
        word = impact_dictionary[int(word)]
        out.append([word, i , weight, counter[word]])

df = pd.DataFrame(out, columns=['word', 'topic_id', 'importance', 'word_count'])        

fig, axes = plt.subplots(3, 3, figsize=(21,12), sharey=True)
cols = [color for name, color in colors.TABLEAU_COLORS.items()]

for i, ax in enumerate(axes.flatten()):
    ax.bar(x='word', height="word_count", data=df.loc[df.topic_id==i, :], color=cols[i], width=0.5, alpha=0.3, label='Word Count')
    ax_twin = ax.twinx()
    ax_twin.bar(x='word', height="importance", data=df.loc[df.topic_id==i, :], color=cols[i], width=0.2, label='Weights')
    ax.set_ylabel('Word Count', color=cols[i])
    ax_twin.set_ylim(0, 0.025); ax.set_ylim(0, 1500)
    ax.set_title('Topic: ' + str(i), color=cols[i], fontsize=12)
    ax.tick_params(axis='y', left=False)
    ax.set_xticklabels(df.loc[df.topic_id==i, 'word'], rotation=30, horizontalalignment= 'right')
    ax.legend(loc='upper left'); ax_twin.legend(loc='upper right')
    ax.grid(False)
    ax_twin.grid(False)
fig.suptitle('Topics for Impact Assessment of Planned Policies and Measures', fontsize=16)    
fig.tight_layout()    
plt.show()