# EWS Project Data  - Sample Review 

In [52]:
import pandas as pd
import numpy as np
import os
from collections import Counter
%matplotlib inline
pd.set_option('max_rows',300)

**Note this encoding may be necessary because of odd characters in the dataframe**

In [54]:
projects=pd.read_csv('../Data/EWS_Published Project_Listing_DD.csv', encoding='ISO-8859-1')
projects = projects[projects['EWS ID'].notnull()]

In [55]:
projects.shape

(6839, 62)

In [56]:
projects.head()

Unnamed: 0,EWS ID,ProjectNumber,Published,Bank Risk Rating,Project Status,EWS URL,Detailed Analysis URL,Project Name,City,Country Count,...,Sector 7,Last Edited,Date Scraped,Date Disclosed,Board Date,Source URL,Project Cost,Investment Amount,Project Description,Contact Information
0,29164,AFDB-P-TN-BB0-007,Published,U,Proposed,https://ews.rightsindevelopment.org/projects/p...,,TUNISIA FERTILIZER PROJECT,,1,...,,9/4/17,8/15/17,12/13/01,12/13/01,http://www.afdb.org/en/projects-and-operations...,,,,ACCOUNTABILITY MECHANISM OF AfDB\r\r\r\rThe In...
1,29166,AFDB-P-SZ-HAA-001,Published,U,Approved,https://ews.rightsindevelopment.org/projects/p...,,LINE OF CREDIT TO SWAZILAND DEVELOPMENT FINANC...,,1,...,,9/4/17,8/15/17,12/13/01,5/12/17,http://www.afdb.org/en/projects-and-operations...,4.76,1.36,,MACHARIA Lilian Wanjiru - PIFD1\r\r\r\rACCOUNT...
2,29931,IADB-UR-T1100,Pending,C,Approved,https://ews.rightsindevelopment.org/projects/u...,,Supporting INEFOP in Improving Labor Training ...,,1,...,,,10/3/17,12/31/99,7/16/13,http://www.iadb.org/en/projects/project-descri...,0.44,0.44,,
3,30104,IADB-BR-T1279,Pending,C,Approved,https://ews.rightsindevelopment.org/projects/b...,,"Racial Equality and Social, Economic, Politica...",,1,...,,,10/3/17,12/31/99,6/4/13,http://www.iadb.org/en/projects/project-descri...,0.97,0.82,,
4,30322,IADB-PE-T1297,Pending,C,Approved,https://ews.rightsindevelopment.org/projects/p...,,Adaptation to Climate Change of the Fishery Se...,,1,...,,,10/3/17,12/31/99,12/4/13,http://www.iadb.org/en/projects/project-descri...,1.5,1.5,,


** Null Check **

In [29]:
projects.count()/len(projects.index)

EWS ID                   1.000000
ProjectNumber            1.000000
Published                1.000000
Bank Risk Rating         1.000000
Project Status           0.937564
EWS URL                  1.000000
Detailed Analysis URL    0.000000
Project Name             0.999854
City                     0.169762
Country Count            1.000000
Country 1                0.909782
Country 2                0.035824
Country 3                0.019886
Country 4                0.011844
Country 5                0.007750
Country 6                0.005118
Country 7                0.003363
Country 8                0.001901
Country 9                0.001608
Country 10               0.001462
Country 11               0.001024
Country 12               0.000585
Borrower or Client       0.761807
Private Actor Count      0.999123
Private Actor 1          0.077204
Private Actor 2          0.017985
Private Actor 3          0.006580
Private Actor 4          0.003071
Private Actor 5          0.002193
Private Actor 

## Project Description Column will Likely Be most Useful for Matching 

**Notes**
    * Some descriptions are pretty short - not sure how easy it will be to match to those
    * Some of the other fields will likely be useful (Country, Borrower or Client, etc.)

In [57]:
for i in projects.sample(15)['Project Description']:
    print(i)
    print('*****\n')

Community Health and SafetyThe Abidjan shopping mall is located in an urban area, and adjacent to a school. To increase traffic safety during operation, CFAO Retail built a small access road in front of the shopping mall and installed signage limiting speed around it.To address potential fire and life safety risks, CFAO Retail used French Standards for the Abidjan project. For each development, SGI shall provide a formal statement from a suitably qualified fire safety professional acceptable to IFC that the life and fire safety-related aspects of the building and fire safety system designs meet all local life and fire safety regulations and an internationally-accepted life safety code.  Following completion of construction and before public opening of any mall, SGI shall provide a second certification from a suitably qualified professional acceptable to IFC that the building and its fire safety systems were constructed according to the previously verified design or alternatively identi

## Looking at the Sector Data 

This could be a dataset that could help in the tagging of the news articles with (Sector Infortmation)

In [30]:
def get_category_cols(category, additional_removes=None):
    cols = [i for i in projects.columns if category in i]
    cols.remove(category + ' Count')
    if additional_removes:
        [cols.remove(i) for i in additional_removes]
    return cols

sector_cols = get_category_cols('Sector')   
all_sectors = projects[sector_cols].as_matrix().flatten()


In [31]:
for i in Counter(all_sectors): print(i)

nan
Climate and Environment
Mining
Infrastructure
Finance
Humanitarian Response
Agriculture and Forestry
Transport
Education and Health
Hydropower
Communications
Construction
Technical Cooperation
Water and Sanitation
Energy
Industry and Trade
Law and Government


--------------

# NOTE 

Might be able to use this to also classify the Bank and Country 

**Countries**

In [32]:
country_cols = get_category_cols('Country')
all_countries = projects[country_cols].as_matrix().flatten()
country_counter = Counter(all_countries)
print(len(country_counter))
country_counter

180


Counter({nan: 75228,
         u'Afghanistan': 48,
         u'Albania': 27,
         u'Algeria': 3,
         u'Angola': 8,
         u'Argentina': 163,
         u'Armenia': 50,
         u'Austria': 24,
         u'Azerbaijan': 31,
         u'Bahamas': 30,
         u'Bangladesh': 140,
         u'Barbados': 14,
         u'Belarus': 17,
         u'Belgium': 25,
         u'Belize': 17,
         u'Benin': 13,
         u'Bhutan': 36,
         u'Bolivia': 106,
         u'Bosnia and Herzegovina': 34,
         u'Botswana': 5,
         u'Brazil': 213,
         u'Bulgaria': 22,
         u'Burkina Faso': 23,
         u'Burundi': 9,
         u'Cambodia': 72,
         u'Cameroon': 26,
         u'Canada': 1,
         u'Cape Verde': 6,
         u'Central African Republic': 16,
         u'Chad': 17,
         u'Chile': 82,
         u'China': 255,
         u'Colombia': 168,
         u'Comoros': 2,
         u'Congo, Democratic Republic of': 26,
         u'Congo, Republic of': 9,
         u'Cook Islands': 1,


**Banks**

In [33]:
banks_cols = get_category_cols('Bank', ['Bank Risk Rating'])
all_banks = projects[banks_cols].as_matrix().flatten()
all_banks = [i for i  in all_banks if  pd.notnull(i)] ## Something weird with the nulls in this one. 
banks_counter = Counter(all_banks)
print(len(banks_counter))
banks_counter

13


Counter({u'African Development Bank (AFDB)': 86,
         u'Asian Development Bank (ADB)': 1159,
         u'Asian Infrastructure Investment Bank (AIIB)': 62,
         u'European Bank for Reconstruction and Development (EBRD)': 496,
         u'European Investment Bank (EIB)': 1029,
         u'Green Climate Fund (GCF)': 72,
         u'Inter-American Development Bank (IADB)': 1585,
         u'Inter-American Investment Corporation (IIC)': 135,
         u'International Finance Corporation (IFC)': 1057,
         u'Multilateral Investment Guarantee Agency (MIGA)': 88,
         u'Netherlands Development Finance Company (FMO)': 249,
         u'New Development Bank (NDB)': 13,
         u'World Bank (WB)': 888})

In [34]:
for b in banks_counter: print(b)

Green Climate Fund (GCF)
European Bank for Reconstruction and Development (EBRD)
International Finance Corporation (IFC)
Asian Development Bank (ADB)
Asian Infrastructure Investment Bank (AIIB)
New Development Bank (NDB)
African Development Bank (AFDB)
European Investment Bank (EIB)
Inter-American Development Bank (IADB)
Netherlands Development Finance Company (FMO)
World Bank (WB)
Inter-American Investment Corporation (IIC)
Multilateral Investment Guarantee Agency (MIGA)


------------