# Problem 10: Political Network Connections

In this problem, you will analyze the network connections and strength between all persons and organizations in the *Trump World* using a combination of hash tables (i.e., dictionaries) and pandas dataframe.  

## The dataset

The dataset for this problem is built from public records, news reports, and other sources on the Trump family, his Cabinet picks, and top advisers - more than 1,500 people and organizations altogether. 

Each row represents a connection between a person and an organization (e.g., The Trump Organization Inc. and Donald J. Trump), a person and another person (e.g., Donald J. Trump and Linda McMahon), or two organizations (e.g., Bedford Hills Corp. and Seven Springs LLC).

Source: https://www.buzzfeednews.com/article/johntemplon/help-us-map-trumpworld

Before starting, please run the following cell to set up the environment and import the data to `Network`.

In [1]:
import math
import pandas as pd
import numpy as np
from collections import defaultdict

Network = pd.read_csv("./resource/asnlib/publicdata/network/network.csv", encoding='latin-1' )
assert len(Network) == 3380
Network.head()

Unnamed: 0,Entity A Type,Entity A,Entity B Type,Entity B,Connection_Strength
0,Organization,4 SHADOW TREE LANE MEMBER CORP.,Organization,4 SHADOW TREE LANE LLC,0.469155
1,Organization,40 WALL DEVELOPMENT ASSOCIATES LLC,Organization,40 WALL STREET LLC,0.03548
2,Organization,40 WALL STREET LLC,Organization,40 WALL STREET COMMERCIAL LLC,0.177874
3,Organization,40 WALL STREET MEMBER CORP.,Organization,40 WALL STREET LLC,0.236508
4,Organization,401 MEZZ VENTURE LLC,Organization,401 NORTH WABASH VENTURE LLC,0.169532


**Exercise 0** (1 points). Create a subset of the data frame named `Network_sub`, keeping only records where `Entity B` contains the keyword "TRUMP" (not case sensitive).

In [2]:
# Store the subset in Network_sub
###
Network_sub= Network[Network['Entity B'].str.contains('TRUMP')]

###


In [3]:
# Test cell: `test_subset`

assert type(Network_sub)==pd.DataFrame, "Your subset is not a panda dataframe"
assert list(Network_sub)==['Entity A Type','Entity A','Entity B Type','Entity B','Connection_Strength'], "Your subset columns are not consistent with the master dataset"
assert len(Network_sub)==648, "The length of your subset is not correct"

test = Network_sub.sort_values(by='Connection_Strength')
test.reset_index(drop=True, inplace=True)
assert test.loc[0,'Connection_Strength']==0.001315204
assert test.loc[200,'Connection_Strength']==0.312599997
assert test.loc[400,'Connection_Strength']==0.610184514
assert test.loc[647,'Connection_Strength']==0.996641965

print("\n(Passed.)")


(Passed.)


Now, let's take a look at part of the `Network_sub` data.

In [4]:
Network_sub.iloc[25:36]

Unnamed: 0,Entity A Type,Entity A,Entity B Type,Entity B,Connection_Strength
232,Person,BRIAN BAUDREAU,Organization,"THE TRUMP ORGANIZATION, INC.",0.249506
237,Organization,"BRIARCLIFF PROPERTIES, INC.",Organization,TRUMP BRIARCLIFF MANOR DEVELOPMENT LLC,0.102998
238,Person,BRITTANY HEBERT,Organization,THE ERIC TRUMP FOUNDATION,0.724913
257,Person,CARTER PAGE,Person,DONALD J. TRUMP,0.694884
280,Person,CHARLES P. REISS,Organization,"THE TRUMP ORGANIZATION, INC.",0.937458
283,Person,CHEN SITING AKA CHARLYNE CHEN,Organization,TRUMP ORGANIZATION LLC,0.137199
284,Organization,"CHEVY CHASE TRUST HOLDINGS, INC.",Organization,TRUMP NATIONAL GOLF CLUB WASHINGTON DC LLC,0.925422
286,Person,CHLOE MURDOCH,Person,IVANKA TRUMP,0.805567
294,Person,CHRISTL MAHFOUZ,Organization,THE ERIC TRUMP FOUNDATION,0.42678
326,Organization,DAEWOO AMERICA DEVELOPMENT (NEW YORK) CORP,Organization,TRUMP KOREA LLC,0.994785


**Exercise 1** (4 points). Write a function 

```python
def Connection_Strength(Network_sub, Entity_B_Type)
```

that takes two inputs

1. `Network_sub` is the dataset you get from exercise 0
2. `Entity_B_Type` can take two values: either `Person` or `Organization`

and for every entity A that is connected to entity B, based on the type of entity B, returns a nested dictionary (i.e. dictionary of dictionaries) of the form:

```python 
{Entity A: {Entity B: Connection_Strength, Entity B: Connection_Strength}, ... }```

For example: for entity A that is connected to entity B of type person, the function will return something like the following: 

```python
{'DONALD J. TRUMP': {'DONALD TRUMP JR.': 0.453990548,
  'ERIC TRUMP': 0.468002101,
  'IVANKA TRUMP': 0.773874808,
  'MARYANNE TRUMP BARRY': 0.330120053,
  'MELANIA TRUMP': 0.5171444000000001},
 'DONALD J. TRUMP FOR PRESIDENT, INC.': {'DONALD J. TRUMP': 0.377887355},
 'DONALD TRUMP JR.': {'ERIC TRUMP': 0.405052388, 'VANESSA TRUMP': 0.025756815},
 'GRACE MURDOCH': {'IVANKA TRUMP': 0.966637541},
 'IVANKA M. TRUMP BUSINESS TRUST': {'IVANKA TRUMP': 0.141785871}, ...}```

In [5]:
def Connection_Strength(Network_sub, Entity_B_Type):
    assert type(Entity_B_Type) == str
    assert Entity_B_Type in ['Person', 'Organization']
    ###

    dico = {}
    
    Network_sub = Network_sub[Network_sub['Entity B Type'] == Entity_B_Type]
    EntityAs = set(Network_sub['Entity A'])
    for A in EntityAs:
        dfa = Network_sub[Network_sub['Entity A'] == A]
        dici = {}
        for i in range(len(dfa)):
            dici[dfa.iloc[i]['Entity B']] = dfa.iloc[i]['Connection_Strength']
        dico[A] = dici
    return dico
    ###


In [6]:
# Test Cell: `Connection_Strength`

# Create a dictonary 'Person' for entity B of type person
Person = Connection_Strength(Network_sub, 'Person')
# Create a dictionary 'Organization' for entity B of type organization
Organization = Connection_Strength(Network_sub, 'Organization')

assert type(Person)==dict or defaultdict, "Your function does not return a dictionary"
assert len(Person)==17, "Your result is wrong for entity B of type person"
assert len(Organization)==296, "Your result is wrong for entity B of type organization"

assert Person['DONALD J. TRUMP']=={'DONALD TRUMP JR.': 0.453990548,'ERIC TRUMP': 0.468002101,'IVANKA TRUMP': 0.773874808,
  'MARYANNE TRUMP BARRY': 0.330120053,'MELANIA TRUMP': 0.5171444000000001}, "Wrong result"
assert Person['DONALD J. TRUMP FOR PRESIDENT, INC.']=={'DONALD J. TRUMP': 0.377887355}, "Wrong result"
assert Person['WENDI DENG MURDOCH']=={'IVANKA TRUMP': 0.669636181}, "Wrong result"

assert Organization['401 MEZZ VENTURE LLC']=={'TRUMP CHICAGO RETAIL LLC': 0.85298544}, "Wrong result"
assert Organization['ACE ENTERTAINMENT HOLDINGS INC']=={'TRUMP CASINOS INC.': 0.202484568,'TRUMP TAJ MAHAL INC.': 0.48784823299999996}, "Wrong result"
assert Organization['ANDREW JOBLON']=={'THE ERIC TRUMP FOUNDATION': 0.629688777}, "Wrong result"

print("\n(Passed.)")


(Passed.)


**Exercise 2** (1 point). For the dictionary `Organization` **created in the above test cell**, create another dictionary `Organization_avg` which for every entity A gives the average connection strength (i.e., the average of nested dictionary values). `Organization_avg` should be in the following form:
```python
{Entity A: avg_Connection_Strength, Entity A: avg_Connection_Strength, ... }```


In [7]:
###
Organization_avg  = {}
for i, dic in Organization.items():
    Organization_avg[i] = sum(dic.values())/len(dic)

###


In [8]:
# Test Cell: `Organization_avg`
assert type(Organization_avg)==dict or defaultdict, "Organization_avg is not a dictionary"
assert len(Organization_avg)==len(Organization)

for k_, v_ in {'401 MEZZ VENTURE LLC': 0.85298544,
               'DJT HOLDINGS LLC': 0.5855800477222223,
               'DONALD J. TRUMP': 0.4878277050144927,
               'JAMES BURNHAM': 0.187474088}.items():
    print(k_, Organization_avg[k_], v_)
    assert math.isclose(Organization_avg[k_], v_, rel_tol=4e-15*len(Organization[k_])), \
           "Wrong result for '{}': Expected {}, got {}".format(k_, v_, Organization_avg[k_])

print("\n(Passed.)")

401 MEZZ VENTURE LLC 0.85298544 0.85298544
DJT HOLDINGS LLC 0.5855800477222224 0.5855800477222223
DONALD J. TRUMP 0.4878277050144922 0.4878277050144927
JAMES BURNHAM 0.187474088 0.187474088

(Passed.)


**Exercise 3** (4 points). Based on the `Organization_avg` dictionary you just created, determine which organizations have an average connection strength that is strictly greater than a given threshold, `THRESHOLD` (defined in the code cell below). Then, create a new data frame named `Network_strong` that has a subset of the rows of `Network_sub` whose `Entity A` values match these organizations **and** whose `"Entity B Type"` equals `"Organization"`.

In [9]:
THRESHOLD = 0.5
###

df_avg = pd.DataFrame(Organization_avg.items(), columns=['Entity A', 'Avg']) 

df_avg = df_avg[df_avg['Avg'] > THRESHOLD]

Network_strong = df_avg.merge(Network_sub[Network_sub['Entity B Type'] == 'Organization'], on = 'Entity A')
# Network_strong = Network_strong.drop(['Avg'], axis = 1, inplace = True)
Network_strong.drop(['Avg'], axis = 1, inplace = True)
Network_strong = Network_strong[['Entity A Type','Entity A','Entity B Type','Entity B','Connection_Strength']]
Network_strong
###


Unnamed: 0,Entity A Type,Entity A,Entity B Type,Entity B,Connection_Strength
0,Organization,LFB ACQUISITION LLC,Organization,TRUMP NATIONAL GOLF CLUB - BEDMINSTER,0.629015
1,Organization,PRUDENTIAL FINANCIAL,Organization,TRUMP TOWER COMMERCIAL LLC,0.920335
2,Person,HUSSAIN ALI SAJWANI,Organization,TRUMP ORGANIZATION LLC,0.974541
3,Person,MATTHEW CALAMARI,Organization,"THE TRUMP ORGANIZATION, INC.",0.755426
4,Person,RONALD C. LIEBERMAN,Organization,"THE TRUMP ORGANIZATION, INC.",0.836111
...,...,...,...,...,...
189,Organization,TRUMP ORGANIZATION LLC,Organization,"TRUMP MARKS MMA, LLC",0.951712
190,Organization,TRUMP VINEYARD ESTATES LLC,Organization,TRUMP VINEYARD ESTATES LOT 3 OWNER LLC,0.882507
191,Organization,INVESTORS SAVINGS BANK,Organization,TRUMP PARK AVENUE LLC,0.916328
192,Person,DANIEL RIDLOFF,Organization,"THE TRUMP ORGANIZATION, INC.",0.607196


In [10]:
# Test Cell: `Network_strong`
assert type(Network_strong)==pd.DataFrame, "Network_strong is not a panda dataframe"
assert list(Network_strong)==['Entity A Type','Entity A','Entity B Type','Entity B','Connection_Strength'], "Your Network_strong columns are not consistent with the master dataset"
assert len(Network_strong)==194, "The length of your Network_strong is not correct. Correct length should be 194."
test2 = Network_strong.sort_values(by='Connection_Strength')
test2.reset_index(drop=True, inplace=True)
assert math.isclose(test2.loc[0, 'Connection_Strength'], 0.039889119, rel_tol=1e-13)
assert math.isclose(test2.loc[100, 'Connection_Strength'], 0.744171895, rel_tol=1e-13)
assert math.isclose(test2.loc[193, 'Connection_Strength'], 0.996641965, rel_tol=1e-13)

print("\n(Passed.)")


(Passed.)


**Fin!** Remember to test your solutions by running them as the autograder will: restart the kernel and run all cells from "top-to-bottom." Also remember to submit to the autograder; otherwise, you will **not** get credit for your hard work!