# Data Augmentation

---

The goal here is to enrich our dataset. 


We are not entirely reliant on labbelled data, therefore we will exploit the [SDG Knowledge website](https://sdg.iisd.org/).

We will do the following


1.   Exploit the SDGs targets description and indicators to produce new labelled data.
2.   Webascrapping in an intelligent way the [SDG Knowledge website](https://sdg.iisd.org/).



In [None]:
import sys
import os
import json
from pprint import pprint
import regex as re
import requests as rq
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup

In [None]:
from google.colab import drive
drive.mount('/content/drive', force_remount=True)

Mounted at /content/drive


In [None]:
path = "/content/drive/MyDrive/Hackathon_ISEP/Data/"
sys.path.append(path)
os.environ['DRIVE_PATH'] = path.replace(' ', ' ')
print(os.environ['DRIVE_PATH'])

/content/drive/MyDrive/Hackathon_ISEP/Data/


## Targets and Indicators exploitation

---



### Load

In [None]:
odd_ids = [12, 15, 16]
odd2ind = {12: 0, 15: 1, 16: 2}
def odd(i): 
    return odd2ind[i]

In [None]:
df_filtered = pd.read_csv(path+"raw_filtered.csv")

df_cibles_12 = pd.read_excel(path+"afd_snaps_labeled_cibles.xlsx", sheet_name="SDG 12")
df_cibles_15 = pd.read_excel(path+"afd_snaps_labeled_cibles.xlsx", sheet_name="SDG 15")
df_cibles_16 = pd.read_excel(path+"afd_snaps_labeled_cibles.xlsx", sheet_name="SDG 16")

In [None]:
df_cibles = [df_cibles_12, df_cibles_15, df_cibles_16]

In [None]:
df_filtered.shape

(276, 19)

### Explore

In [None]:
df_cibles_12.head()

Unnamed: 0,Text,Manual_1,Manual_2
0,provinces new domestic waste treatment capacit...,12.4,0
1,Water quality and flow Water contamination due...,12.4,0
2,The pollution reduction targets refer to i the...,12.5,0
3,From the analysis of the relevant environmenta...,0.0,0
4,be recovered by producers dealers and users Me...,12.4,0


In [None]:
for i, odd_id in enumerate(odd_ids):
    print(f'----- ODD {odd_id} -----')
    print(df_cibles[i].shape)
    print(df_cibles[i]['Manual_1'].value_counts())
    print(df_cibles[i]['Manual_2 '].value_counts())
    print('\n')

----- ODD 12 -----
(50, 3)
12.4    27
12.5    11
0        6
12.a     2
12.3     2
12.7     1
12.2     1
Name: Manual_1, dtype: int64
0       33
12.5     7
12.4     5
12.2     2
12.a     1
12.6     1
12.7     1
Name: Manual_2 , dtype: int64


----- ODD 15 -----
(50, 3)
15.1    15
0.0     14
15.5     7
15.2     6
15.3     4
15.6     4
Name: Manual_1, dtype: int64
0       36
15.5     4
15.1     4
15.2     2
15.b     2
15.3     1
15.6     1
Name: Manual_2 , dtype: int64


----- ODD 16 -----
(50, 3)
0        11
16.6     10
16.5     10
16.7      7
16.a      5
16.1      4
16.10     1
16.3      1
16.b      1
Name: Manual_1, dtype: int64
0        31
16.7      4
16.6      3
16.3      2
16.5      2
16.10     2
16.4      2
16.1      2
16.a      1
16.b      1
Name: Manual_2 , dtype: int64




### Extract *targets* description and *indicators*

In [None]:
targets_desc_json = []
for odd_id in odd_ids:
    goal_target_url = f"https://unstats.un.org/SDGAPI/v1/sdg/Goal/{odd_id}/Target/List?includechildren=true"
    response = rq.get(goal_target_url)
    targets_desc_json.append(response.json())

In [None]:
def create_dataframe_targets_desc_kpi():
    result = []
    for odd_id in odd_ids:
        list_desc_and_kpi = targets_desc_json[odd2ind[odd_id]][0]['targets']
        for desc_kpi in list_desc_and_kpi:
            desc_id = desc_kpi['code']
            desc_str = desc_kpi['description']
            result.append([desc_id, None, desc_str])

            list_kpi = desc_kpi['indicators']
            for kpi in list_kpi:
                kpi_id = kpi['code']
                kpi_str = kpi['description']
                result.append([desc_id, kpi_id, kpi_str])
    return pd.DataFrame(result, columns=['desc_id', 'kpi_id', 'description'])

In [None]:
df_targets_kpis = create_dataframe_targets_desc_kpi()

In [None]:
df_targets_kpis

Unnamed: 0,desc_id,kpi_id,description
0,12.1,,Implement the 10-Year Framework of Programmes ...
1,12.1,12.1.1,"Number of countries developing, adopting or im..."
2,12.2,,"By 2030, achieve the sustainable management an..."
3,12.2,12.2.1,"Material footprint, material footprint per cap..."
4,12.2,12.2.2,"Domestic material consumption, domestic materi..."
...,...,...,...
81,16.10,16.10.2,Number of countries that adopt and implement c...
82,16.a,,"Strengthen relevant national institutions, inc..."
83,16.a,16.a.1,Existence of independent national human rights...
84,16.b,,Promote and enforce non-discriminatory laws an...


In [None]:
df_targets_kpis.to_csv(path+'targets_desc_and_kpi.csv')

In [None]:
df_targets_kpis = pd.read_csv(path+'targets_desc_and_kpi.csv')
df_targets_kpis.drop(columns=['Unnamed: 0'], inplace=True)

### Add descriptions and indicators to excel labelled snaps

In [None]:
def augment_dataset(odd_id):
    is_odd_id = df_targets_kpis['desc_id'].str.contains(str(odd_id))
    df_aug_odd_id = df_targets_kpis[is_odd_id].copy()

    # Drop kpi_id
    df_aug_odd_id.drop(columns=['kpi_id'], inplace=True)

    # Add column of 0 as Manual_2
    zeros = [0] * df_aug_odd_id.shape[0]
    df_aug_odd_id['Manual_2 '] = zeros

    # Rename columns
    df_aug_odd_id.rename(columns={'desc_id': 'Manual_1', 'description': 'Text'}, inplace=True)

    # Concat
    df_aug_odd_id = pd.concat([df_cibles[odd(odd_id)], df_aug_odd_id])

    return df_aug_odd_id

In [None]:
df_aug_12 = augment_dataset(12)
df_aug_15 = augment_dataset(15)
df_aug_16 = augment_dataset(16)

In [None]:
df_aug_12.to_csv(path+"aug_cibles_12.csv")
df_aug_15.to_csv(path+"aug_cibles_15.csv")
df_aug_16.to_csv(path+"aug_cibles_16.csv")

## Web scraping

---


Scrap text of http://sdg.iisd.org/news/

In [None]:
url = "https://sdg.iisd.org/news/unccds-global-land-outlook-calls-for-activating-land-restoration-agenda/"
page = rq.get(url)

In [None]:
sdg_names = [f'SDG{i}' for i in range(1, 18)]
sdg_names

['SDG1',
 'SDG2',
 'SDG3',
 'SDG4',
 'SDG5',
 'SDG6',
 'SDG7',
 'SDG8',
 'SDG9',
 'SDG10',
 'SDG11',
 'SDG12',
 'SDG13',
 'SDG14',
 'SDG15',
 'SDG16',
 'SDG17']

In [None]:
def get_html_code(url):
    page = rq.get(url)
    soup = BeautifulSoup(page.content, "html.parser") # Code HTML de la page
    return soup

In [None]:
def get_text(soup):
    job_elements = soup.find_all("div", class_="text -normal content") # Contenu de l'article avec les balises <p> et <li>

    paragraphs = []

    for job_element in job_elements:

        # Extract text from <p> tags
        paragraph_elements = job_element.find_all("p")
        for paragraph_element in paragraph_elements:
            paragraphs.append(paragraph_element.text)

        # Extract text from <li> tags
        li_elements = job_element.find_all("li")
        for li_element in li_elements:
            paragraphs.append(li_element.text)

    return paragraphs

In [None]:
def get_sdgs(soup):
    sdg_elements = soup.find_all("span", class_="sdg")
    sdg_ids = [int(sdg_element.text.split('.')[0]) for sdg_element in sdg_elements]
    return sdg_ids

In [None]:
def get_labels(sdg_ids):
    row = [0]*17
    for sdg_id in sdg_ids:
        row[sdg_id-1] = 1
    return row

In [None]:
def create_dataset(url, columns):
    soup = get_html_code(url)
    text = get_text(soup)
    sdg_ids = get_sdgs(soup)
    one_hot_labels = get_labels(sdg_ids)

    result = []
    for paragraph in text:
        row = [paragraph]
        row.extend(one_hot_labels)
        row.extend([url])
        result.append(row)
    return pd.DataFrame(result, columns=columns)

In [None]:
columns = ["text"]
columns.extend(sdg_names)
columns.extend(["url"])
columns

['text',
 'SDG1',
 'SDG2',
 'SDG3',
 'SDG4',
 'SDG5',
 'SDG6',
 'SDG7',
 'SDG8',
 'SDG9',
 'SDG10',
 'SDG11',
 'SDG12',
 'SDG13',
 'SDG14',
 'SDG15',
 'SDG16',
 'SDG17',
 'url']

In [None]:
res = create_dataset(url, columns)

In [None]:
res.head(1)

Unnamed: 0,text,SDG1,SDG2,SDG3,SDG4,SDG5,SDG6,SDG7,SDG8,SDG9,SDG10,SDG11,SDG12,SDG13,SDG14,SDG15,SDG16,SDG17,url
0,The UN Convention to Combat Desertification (U...,1,1,1,0,1,1,1,0,0,0,1,1,1,0,1,0,1,https://sdg.iisd.org/news/unccds-global-land-o...


**Articles with almost only SDG12**

In [None]:
urls_ODD12 = [
              "http://sdg.iisd.org/commentary/policy-briefs/sdg-knowledge-weekly-circular-economy-transitions-and-resource-use/",
              "http://sdg.iisd.org/commentary/policy-briefs/report-evaluates-tuvalus-progress-towards-improving-waste-management/",
              "http://sdg.iisd.org/commentary/policy-briefs/shipping-partnership-advances-waste-management-in-pacific-islands/",
              "http://sdg.iisd.org/commentary/policy-briefs/wto-members-launch-open-ended-informal-dialogue-to-promote-sustainable-plastics-economy/"
            ]

In [None]:
res_12 = create_dataset(urls_ODD12[0], columns)
for url in urls_ODD12[1:] :
  temp = create_dataset(url,columns)
  res_12 = res_12.append(temp)

In [None]:
res_12.shape

(67, 19)

**Articles with almost only SDG 15**

In [None]:
urls_ODD15 = [
              "http://sdg.iisd.org/commentary/policy-briefs/sprep-report-assesses-state-of-environment-and-conservation-in-pacific-islands/",
              "http://sdg.iisd.org/commentary/policy-briefs/papua-new-guinea-works-to-improve-management-of-protected-areas/",
              "http://sdg.iisd.org/commentary/policy-briefs/a-un-high-level-week-with-meetings-and-moments-but-no-motorcades/",
              "http://sdg.iisd.org/commentary/policy-briefs/nature-waves-flags-of-warning-on-world-wildlife-day/"   
            ]

In [None]:
res_15 = create_dataset(urls_ODD15[0], columns)
for url in urls_ODD15[1:] :
  temp = create_dataset(url,columns)
  res_15 = res_15.append(temp)

In [None]:
res_15.shape

(82, 19)

**Articles with almost only SDG 16**

In [None]:
urls_ODD16 = [
  "http://sdg.iisd.org/commentary/policy-briefs/sdg-knowledge-weekly-transitioning-to-a-circular-economy-sustainable-consumption-and-production/",
  "http://sdg.iisd.org/commentary/policy-briefs/sdg-knowledge-weekly-circular-economy-transitions-and-resource-use/",
  "http://sdg.iisd.org/commentary/policy-briefs/monthly-forecast-november-2017/",
  "http://sdg.iisd.org/commentary/policy-briefs/shipping-partnership-advances-waste-management-in-pacific-islands/"
            ]

In [None]:
res_16 = create_dataset(urls_ODD16[0], columns)
for url in urls_ODD16[1:] :
  temp = create_dataset(url,columns)
  res_16 = res_16.append(temp)

In [None]:
res_16.head(1)

Unnamed: 0,text,SDG1,SDG2,SDG3,SDG4,SDG5,SDG6,SDG7,SDG8,SDG9,SDG10,SDG11,SDG12,SDG13,SDG14,SDG15,SDG16,SDG17,url
0,The 2019 World Resources Forum took place from...,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,1,http://sdg.iisd.org/commentary/policy-briefs/s...


### Concatenation of the augmented datasets and creation of the final dataset 

In [None]:
final_dataframe_unlabelled = pd.concat([res_12,res_15,res_16]).drop('url', axis=1)

In [None]:
print(final_dataframe_unlabelled.shape)
final_dataframe_unlabelled.head(1)

(193, 18)


Unnamed: 0,text,SDG1,SDG2,SDG3,SDG4,SDG5,SDG6,SDG7,SDG8,SDG9,SDG10,SDG11,SDG12,SDG13,SDG14,SDG15,SDG16,SDG17
0,This week’s SDG Knowledge Weekly reviews publi...,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1


# Data Processing

---

What we will do now is : with the same (almost) pipeline we will process the test Dataset.


### Regularization of text length

**As we can see below there is an issue with too long paragraphs**.

We will try to address that by splitting the paragraphs containing more than 70 words.


In [None]:
final_dataframe_unlabelled['text'].apply(lambda x: len(x.split())).value_counts().sort_index()

0       7
1       1
2       4
3      11
4       2
       ..
139     2
142     2
146     1
149     1
151     1
Name: text, Length: 83, dtype: int64

In [None]:
# text length correction
def longParagraphProcessign(paragraph):
  out = []
  final_string = ''
  threshold = 200
  for chunk in paragraph.split('. '):
    if out and len(chunk)+len(out[-1]) < threshold:
      out[-1] += ' '+chunk+'.'
    else:
      out.append(chunk+'.')
  for ele in out : 
    final_string += ele+'/c'
  final_string = final_string[:-3]
  return( final_string )

# remove residual from the scrapping
def removeDust(paragraph):
  paragraph = re.sub(r'\[[^)]*\]','',paragraph)
  paragraph = re.sub("[^a-zA-Z0-9]", " ",paragraph)
  return paragraph


# remove 'empty' string
def emptySpace(string):
  return not(string in "              ")

**Tests**

In [None]:
test = "tyui dertyu rtyui [ ertyui ] uyg [SFFghhfgv3456]"
removeDust(test)

'tyui dertyu rtyui '

In [None]:
test0 = """Rationalize inefficient fossil-fuel subsidies that encourage wasteful consumption by removing market distortions, in accordance with national circumstances, including by restructuring taxation and phasing out those harmful subsidies, where they exist, to reflect their environmental impacts, taking fully into account the specific needs and conditions of developing countries and minimizing the possible adverse impacts on their development in a manner that protects the poor and the affected communities. Rationalize inefficient fossil-fuel subsidies that encourage wasteful consumption by removing market distortions, in accordance with national circumstances, including by restructuring taxation and phasing out those harmful subsidies, where they exist, to reflect their environmental impacts, taking fully into account the specific needs and conditions of developing countries and minimizing the possible adverse impacts on their development in a manner that protects the poor and the affected communities.
        """
longParagraphProcessign(test0)

'Rationalize inefficient fossil-fuel subsidies that encourage wasteful consumption by removing market distortions, in accordance with national circumstances, including by restructuring taxation and phasing out those harmful subsidies, where they exist, to reflect their environmental impacts, taking fully into account the specific needs and conditions of developing countries and minimizing the possible adverse impacts on their development in a manner that protects the poor and the affected communities./cRationalize inefficient fossil-fuel subsidies that encourage wasteful consumption by removing market distortions, in accordance with national circumstances, including by restructuring taxation and phasing out those harmful subsidies, where they exist, to reflect their environmental impacts, taking fully into account the specific needs and conditions of developing countries and minimizing the possible adverse impacts on their development in a manner that protects the poor and the affected

Dataset transformation

In [None]:
def pipelineTransformation(final_dataframe_unlabelled):

  out = final_dataframe_unlabelled.copy()
  # Marking the snaps to avoid too long paragraphs 
  out['text'] = out['text'].apply(lambda x : longParagraphProcessign(x))
  
  # Split the paragrahs 
  out["text"] =out["text"].str.split("/c")

  # Split cells
  out = out.explode("text").reset_index(drop=True)

  # Removing the mistakes we notice, that is the residuals between brackets
  out['text'] = out['text'].apply(lambda x : removeDust(x))

  # Removing the white spaces 
  out = out[out['text'].apply(lambda x : emptySpace(x))]
  
  return(out)

In [None]:
df = pipelineTransformation(final_dataframe_unlabelled)

In [None]:
#df.to_csv('/content/drive/MyDrive/Hackathon_ISEP/Data/unlabeled_snaps.csv',sep=';')

In [None]:
testDataDir = '/content/drive/MyDrive/Hackathon_ISEP/Data/test/raw'

In [None]:
filePaths = os.listdir(testDataDir)
dataframe = pd.read_csv(testDataDir + '/' + filePaths[3])

In [None]:
#dataframe.head(100)

**Pipeline for scraped PDFs**

In [None]:
def pipelineTransformationBIS(final_dataframe_unlabelled):

  out = final_dataframe_unlabelled.copy()
  # Marking the snaps to avoid too long paragraphs 
  out['Text'] = out['Text'].apply(lambda x : longParagraphProcessign(x))
  
  # Split the paragrahs 
  out["Text"] =out["Text"].str.split("/c")

  # Split cells
  out = out.explode("Text").reset_index(drop=True)

  # Removing the mistakes we notice, that is the residuals between brackets
  out['Text'] = out['Text'].apply(lambda x : removeDust(x))

  # Removing the white spaces 
  out = out[out['Text'].apply(lambda x : emptySpace(x))]
  
  return(out)


def pipelineTestDir(pathDir):

  filePaths = os.listdir(testDataDir)
  for path in filePaths: 
    # load the data
    dataframe = pd.read_csv(pathDir + '/' + path)
    # apply the pipeline
    dataframe = pipelineTransformationBIS(dataframe)
    # create processes files.
    dataframe.to_csv(pathDir + '/processed/processed_' + path)
  print("Processing successfull ")
  return()

**Uncomment and run the cell below to process and create update the dataframes**

In [None]:
#pipelineTestDir(testDataDir)

## Evaluation Dataset for SDG_12 target


---
**Due to the lack of SDGs16 presence in the external sources we picked some NGO's actions**

**Creation of the 'raw' evaluation dataset for SDG_12**




In [None]:
dictio_odd12 = {'text': ['Improved in waste management also has the potential to create skilled and unskilled jobs in the waste and recycling sector and greater levels of social cohesion through engagement and behavior change around waste issues. This outcome aligns well with Liberia’s new Pro-Poor Agenda for Prosperity and Development (PPAPD) which was enacted following the change in government in 2018.',
                         'Dismantling and decontamination of the sulfuric and phosphoric acid plants . Provision of phosphoric acid and/or mono-ammonium phosphate. Rehabilitation of the gas treatment system in the nitric acid plant and the granulation units . 4. Installation of a scrubber or a granulator in the ammonium nitrate plant . Rehabilitation of the harbor reception facility in the port of Annaba .Spare parts and materials.Support to plant operation and management and workershealth and safety .',
                         'Current accounting and reporting techniques that have been designed for the linear economy are often ill-equipped to truly capture the value and positive impact of circular businesses. Circular accounting describes the practice of measur- ing, analysing and reporting on a company’s financial and non-financial performance, to truly reflect the value and impact of circular businesses on all relevant stakeholders. The transition to a circular economy will require rethinking our present way of doing business',
                         'We are living in a time of rampant pollution and waste, resource scarcity, biodiversity loss and rising global temperatures all of which are linked to our increasing consumption rates. Circular strategies and business models offer solutions, creating an economy that eliminates waste and pollution, keeps products and materials in use and regenerates nature. In a business-as-usual situation where we continue to live beyond the means of the planet, businesses will also suffer and be prone to a range of risks, including price volatility and supply chain failure. Circular businesses have proven to be resilient to such risks and will—in the long termamass more profits than their linear counterparts.'
                         'We must learn to appreciate and quantify the value generated with circular business models. This includes reassessing what we call waste and introducing concepts such as residual value. We should also move away from the existing approach whereby value is considered primarily in the short-term-products being purchased and then disposed of—to one where materials are kept in use for as long as possible',
                         'Circular revenue models allow us to capture value with circular strategies and can be distinguished depending on economic ownership structure. Examples include deposit models, lease and rent models, the Sell-and-Buy Back model (where the user becomes a temporary economic owner and may sell the product back to the producer), as well as the Product-as-a-Service (PaaS) model, where the economic owner of a product is entitled to the use-value of an object'
                          ]}  
df_eval_odd12 = pd.DataFrame(dictio_odd12)

**Pipeline processing of the evaluation dataset for SDG_12**


In [None]:
df = pipelineTransformation(df_eval_odd12)

**Write the final evaluation dataset for SDG_12**




In [None]:
#df.to_csv('/content/drive/MyDrive/Hackathon_ISEP/Data/test/manual/processed_odd12.csv',sep=';')

## Evaluation Dataset for SDG_15 target

---



**Creation of the 'raw' evaluation dataset for SDG_15**




In [None]:
dictio_odd15 = {'text': ['Improving the management of natural resources at the landscape level is important to enhance the country’s resilience to weather-related events and for providing economic opportunities for rural economies. Healthy agriculture and forest landscapes can offset some of the impacts of climate-related disasters by enhancing the forest ecosystem’s resilience to changing weather patterns, providing important safety nets for local communities to cope with climate shocks, enhancing the productivity of farming systems, and reducing damage from flooding and sea level rise, among others. In addition, forest landscapes provide key ecosystem services such as biodiversity habitat, water filtration and availability, increased food security, soil erosion control, and reduction of Greenhouse Gases (GHG) emissions. The unsustainable management of these natural assets negatively affect rural jobs and revenue generation, impacting economic growth and disproportionally affecting the rural poor and vulnerable communities. The COVID-19 pandemic precipitates deforestation and forest degradation associated to an increased internal demand for food, raw material, and commodities, often satisfied through unsustainable farming systems - by August 2020, global deforestation rates had increased 77 percent during the pandemic.',
                         'Sustainable natural resources management and land restoration is the joint responsibility of two ministries. The Ministry of Environment and Natural Resources (Ministerio de Ambiente y Recursos Naturales, MARN) is responsible for implementing the National Environmental Policy, which provides the framework for sustainable use of natural resource protection, conservation, and restoration of the environment. The Ministry of Agriculture and Livestock (Ministerio de Agricultura y Ganadería, MAG) administers the regulations for agriculture, irrigation, forestry, fisheries, and aquaculture. MAG implements national regulations and planning through the General Directorate of Forestry, Watersheds, and Irrigation (Dirección General Ordenamiento Forestal, Cuencas, y Riego, DGFCR) and provides extension and technology transfer through the National Agricultural and Forestry Technology Center (Centro Nacional de Tecnología Agropecuaria y Forestal, CENTA). Locally, municipalities and local governments implement projects in their territories according to Local Sustainable Development Plans (Planes Locales de Desarrollo Sostenible, PDLS). MARN and MAG coordinate with Non-Governmental Organizations (NGOs), international development agencies, and local stakeholders for the implementation of land restoration policies seeking to promote rural development and ecosystems’ adaptation to the impacts of climate change.',
                         'The Government of El Salvador (GoES) is advancing implementation of existing policy instruments, while exploring opportunities to design fiscal policy reforms for a green economy. Implementation builds on extensive experience gained since more than two decades ago through projects supported by the Global Environment Facility (GEF) on environmental management and strategic planning; capacity building on biodiversity conservation and protected areas management; promotion of biodiversity conservation in coffee landscapes and markets for biodiversity; adaptation to climate change; and testing models for integrated management of protected areas, among others. Furthermore, in the 2014-2020 period, El Salvador implemented restauration actions in 241,662 hectares (ha) of degraded lands.37 Most recent actions have been informed by in-depth participatory analysis such as (i) a comprehensive assessment of historic trends and causes of deforestation and forest degradation, carried out in preparation for the EN-REP design with support from the Forest Carbon Partnership Facility (FCPF) REDD+ Readiness Preparation Project (P124935); (ii) application of the Restoration Opportunity Assessment Methodology (ROAM38) methodology to identify cost-effective options for land restoration at national scale, with support from the International Union for Conservation of Nature (IUCN) and other development agencies; and (iii) the development and testing of the Restoration Sustainability index (Indice de Sustentabilidad de la Restauración, ISR), with support from the World Resources Institute (WRI). Currently, the GoES is preparing the €0.8 million Project “Fiscal policy reform for a Green Economy and NDC implementation: Restoration and Sustainable Landscape management in El Salvador”, with financial support from the International Climate Initiative (IKI39).',
                         'South Africa’s is one of most biodiverse countries in the world, and its biodiversity contributes significantly to the national economy, and to local livelihoods. With a varied geography ranging from plains and savannas to deserts and high mountains, South Africa’s ecosystems support over 95,000 species, and its rich biodiversity contributes significantly to the national economy, particularly through nature-based tourism.',
                         'To leverage financial resources and improve capacity to implement the Biodiversity Economy and increase benefits from selected PA landscapes to local communities.',
                         'The proposed Project Development Objective (PDO) is to restore degraded land in El Imposible – Barra de Santiago Conservation Area.'
                         ]}  
df_eval_odd15 = pd.DataFrame(dictio_odd15)

**Pipeline processing of the evaluation dataset for SDG_15**


In [None]:
df = pipelineTransformation(df_eval_odd15)

**Write the final evaluation dataset for SDG_12**

In [None]:
#df.to_csv('/content/drive/MyDrive/Hackathon_ISEP/Data/test/manual/processed_odd15.csv',sep=';')

## Evaluation Dataset for SDG_16 target

---

**Due to the lack of SDGs16 presence in the external sources we picked some NGO's actions**

**Creation of the 'raw' evaluation dataset for SDG_16**




In [None]:
dictio_odd16 = {'text': ['The proposed Post-Conflict Economic Rehabilitation Credit (Post-ConflictERC) is an integralpart of the Banks Transitional Support Strategy (TSS) considered by the Board on January 16, 2001, to assist the Republic of Congo in the transition from war to sustainable peace, in the context of the Governments 2000- 2002InterimPost-ConflictProgram',
                         'Itisthefirst operationof the TSS that would support implementationof: (i) key structural reforms; and (ii) improved governance and better transparency in the management of the country natural wealth and public funds.',
                         'End Corporal Punishment is catalysing progress towards universal prohibition and elimination of violent punishment of children. As a critical initiative of the End Violence Partnership, it is advocating for full and comprehensive law reform to prohibit corporal punishment, raising awareness about the issue, monitoring law throughout the world, and promoting action and implementation. We are working with partners across the world to make violent discipline of children by caregivers a thing of the past.',
                         'Pathfinding countries use the INSPIRE Seven strategies for Ending Violence Against Children to understand the drivers of violence and build integrated responses that improve the lives of children and young people. These strategies are seen throughout the Pathfinding process, including but not limited to the creation of a countrys national action plan to end violence.'
                         ]}  
df_eval_odd16 = pd.DataFrame(dictio_odd16)

**Pipeline processing of the evaluation dataset for SDG_16**


In [None]:
df = pipelineTransformation(df_eval_odd16)

**Write the final evaluation dataset for SDG_16**

In [None]:
#df.to_csv('/content/drive/MyDrive/Hackathon_ISEP/Data/test/manual/processed_odd16.csv',sep=';')