# Extract raw text from party manifestos

Party manifestos ("Parteiprogramme") are an interesting source for NLP projects because they capture the positions and intentions of a political party at a given time. This information can then be contrasted with later political developments and respective textual sources (e.g., parliament protocols).  
  
The below code loads the raw text from the party manifesto PDFs, which were retrieved from the following URLs on Jan 28, 2024:  
- https://www.abgeordnetenwatch.de/bundestag/wahl-2021/wahlprogramme
- https://www.abgeordnetenwatch.de/bundestag/wahl-2017/wahlprogramme

In [1]:
from src.helpers import extract_pdf_viewport

### 2017 manifestos 

In [2]:
gruene_2017 = extract_pdf_viewport(pdf_origin = "data/manifestos/pdf/2017-gruene.pdf",
                                    txt_target = "data/manifestos/txt/2017-gruene.txt",
                                    margins = [75, 50, 75, 50], # top, right, bottom, left page margin
                                    page_range=[7, 238]) # both ends inclusive, page count starts at one
spd_2017 = extract_pdf_viewport(pdf_origin = "data/manifestos/pdf/2017-spd.pdf",
                                    txt_target = "data/manifestos/txt/2017-spd.txt",
                                    margins = [50, 20, 30, 20],
                                    page_range=[6, 113])
cdu_2017 = extract_pdf_viewport(pdf_origin = "data/manifestos/pdf/2017-cdu.pdf",
                                    txt_target = "data/manifestos/txt/2017-cdu.txt",
                                    margins = [75, 50, 75, 50], 
                                    page_range=[5, 76])
linke_2017 = extract_pdf_viewport(pdf_origin = "data/manifestos/pdf/2017-linke.pdf",
                                    txt_target = "data/manifestos/txt/2017-linke.txt",
                                    margins = [30, 30, 40, 30],
                                    page_range=[7, 127])
fdp_2017 = extract_pdf_viewport(pdf_origin = "data/manifestos/pdf/2017-fdp.pdf",
                                    txt_target = "data/manifestos/txt/2017-fdp.txt",
                                    margins = [50, 50, 50, 50],
                                    page_range=[9, 94])
afd_2017 = extract_pdf_viewport(pdf_origin = "data/manifestos/pdf/2017-afd.pdf",
                                    txt_target = "data/manifestos/txt/2017-afd.txt",
                                    margins = [50, 60, 30, 60],
                                    page_range=[5, 72])

### 2021 manifestos

In [3]:
gruene_2021 = extract_pdf_viewport(pdf_origin = "data/manifestos/pdf/2021-gruene.pdf",
                                    txt_target = "data/manifestos/txt/2021-gruene.txt",
                                    margins = [75, 50, 75, 50], # top, right, bottom, left
                                    page_range=[9, 258])
spd_2021 = extract_pdf_viewport(pdf_origin = "data/manifestos/pdf/2021-spd.pdf",
                                    txt_target = "data/manifestos/txt/2021-spd.txt",
                                    margins = [80, 20, 50, 20],
                                    page_range=[3, 65])
cdu_2021 = extract_pdf_viewport(pdf_origin = "data/manifestos/pdf/2021-cdu.pdf",
                                    txt_target = "data/manifestos/txt/2021-cdu.txt",
                                    margins = [50, 50, 50, 70], 
                                    page_range=[5, 140])
linke_2021 = extract_pdf_viewport(pdf_origin = "data/manifestos/pdf/2021-linke.pdf",
                                    txt_target = "data/manifestos/txt/2021-linke.txt",
                                    margins = [30, 30, 40, 30],
                                    page_range=[7, 155])
fdp_2021 = extract_pdf_viewport(pdf_origin = "data/manifestos/pdf/2021-fdp.pdf",
                                    txt_target = "data/manifestos/txt/2021-fdp.txt",
                                    margins = [30, 30, 60, 30],
                                    page_range=[4, 91])
afd_2021 = extract_pdf_viewport(pdf_origin = "data/manifestos/pdf/2021-afd.pdf",
                                    txt_target = "data/manifestos/txt/2021-afd.txt",
                                    margins = [50, 105, 30, 105],
                                    page_range=[7, 101])