### Visualizing Companies Relationships
#### By Ruben Seoane

We studied a few visualization kibraries like Seaborn, Bokeh and Plotlib for representing the training dataset visually (columns [:3]), 

#### *First Attempt*<br>
We decided to create a "word cloud" visualization to show the frequency of entities and words appearing on the trainning set. <br>We found this library: https://github.com/amueller/word_cloud to be a great resource, our intention was to follow this tutorial: http://minimaxir.com/2016/05/wordclouds/.<br>However, both @radpet and I run into problems while installing the library. In my case it seemed that after installation through pip or the original zip file, Python will not recognize the module as installed. <br>In the case of @radpet, using Ubuntu allowed him to correctly import the library, but couldn't implement the word cloud as a layer over the .png image. <br>*_Tutorial Code Follows_*

In [None]:
import numpy as np
from PIL import Image
from os import path
import matplotlib.pyplot as plt
import random

from wordcloud import WordCloud, STOPWORDS


def grey_color_func(word, font_size, position, orientation, random_state=None,
                    **kwargs):
    return "hsl(0, 0%%, %d%%)" % random.randint(60, 100)

d = path.dirname(__file__)

# read the mask image
# taken from
# http://www.stencilry.org/stencils/movies/star%20wars/storm-trooper.gif
mask = np.array(Image.open(path.join(d, "stormtrooper_mask.png")))

# movie script of "a new hope"
# http://www.imsdb.com/scripts/Star-Wars-A-New-Hope.html
# May the lawyers deem this fair use.
text = open("a_new_hope.txt").read()

# preprocessing the text a little bit
text = text.replace("HAN", "Han")
text = text.replace("LUKE'S", "Luke")

# adding movie script specific stopwords
stopwords = set(STOPWORDS)
stopwords.add("int")
stopwords.add("ext")

wc = WordCloud(max_words=1000, mask=mask, stopwords=stopwords, margin=10,
               random_state=1).generate(text)
# store default colored image
default_colors = wc.to_array()
plt.title("Custom colors")
plt.imshow(wc.recolor(color_func=grey_color_func, random_state=3),
           interpolation="bilinear")
wc.to_file("a_new_hope.png")
plt.axis("off")
plt.figure()
plt.title("Default colors")
plt.imshow(default_colors, interpolation="bilinear")
plt.axis("off")
plt.show()

The result should look like this (our idea was to take the image of Darth Vader as it fits the title of the case):

In [2]:
from IPython.display import Image
from IPython.core.display import HTML 
Image(url= "http://minimaxir.com/img/wordclouds/a_new_hope_1.png")

#### *Second Attempt*<br>
Reviewing D3.js visualizations I found the "Chord Diagram" to be a very interesting way to visualize the number of cases where 2 companies appear together, as well as if they are related by ownership or not.<br> Bokeh and Plotlib are good candidates: https://python-graph-gallery.com/231-chord-diagram-with-bokeh/ and https://plot.ly/python/filled-chord-diagram/ 
<br>In the chord diagram, the perimeter of the wheel will be composed by the companies mentioned in the training set and the radius will correlate with the number of times they appear in the data. Two colors for the lines connecting entities could be used to show TRUE/FALSE is_parent relationships. In this case, we had to generate a new table of data to show this relationships which meant a distraction from the main goal, but for a similar case where time is more flexible, it could be a good way to conceptualize the data and entities distribution before building the models.<br><br>As can be seen from the Plotly tutorial, implementing the visualization can be a tedious process, the best alternative was the Bokeh implementation,whose code follows:

In [None]:
import pandas as pd
from bokeh.charts import output_file, Chord
from bokeh.io import show
from bokeh.sampledata.les_mis import data

nodes = data['nodes']
links = data['links']

nodes_df = pd.DataFrame(nodes)
links_df = pd.DataFrame(links)

source_data = links_df.merge(nodes_df, how='left', left_on='source', right_index=True)
source_data = source_data.merge(nodes_df, how='left', left_on='target', right_index=True)
source_data = source_data[source_data["value"] > 5]

chord_from_df = Chord(source_data, source="name_x", target="name_y", value="value")
output_file('chord_from_df.html', mode="inline")
show(chord_from_df)


In [7]:
from IPython.display import Image
from IPython.core.display import HTML 
Image(url= "https://python-graph-gallery.com/wp-content/uploads/231_Chord_Bokeh.png")