# Colaboratory Assignment 4.2

**Instructions**. Below you will find several text cells with programming (short) problems. You can create how many code cells you need to answer them.

There are four problems, but you will only need to solve two. You **must** choose at least one of the problems with the title in <font color='#006633'>green</font>.


**BEFORE YOU START**

Make sure to run the code cell below, to fix the adjacency matrix problem. Also, remember that the next code cell should be the first thing you evaluate. Otherwise, you will to restart your runtime and reimport `networkx`

In [None]:
!pip uninstall -q --yes scipy networkx && pip install -q scipy==1.8 networkx==2.7

## <font color='#006633'>1. Drawing histograms</font>

We did some drawing in the previous problem set. This lesson goes deeper into different ways to customize your plots to make them more informative. Because we want to create the optimal plots in terms of the amount of useful information they show, we will start with a big dataset.

1. Import the `imdb` dataset. It contains information extracted from the [Internet Movie Database](https://www.imdb.com/), restricted only to actors and directors. A link is created when two people worked in the same movie. Note that it is a big file, so it may take a few seconds to load into colaboratory.
2. Create a degree histogram, stored in a dictionary. (as usual)
3. Draw it using the `hist` method from `matplotlib.pyplot`
4. Draw the same histogram using the `bar` method from `matplotlib.pyplot`

In [None]:
def Hdeg(G, FlagZeros=False):
    H = {}
    if FlagZeros:
        n = G.number_of_nodes()
        for k in range(n):
            H[k] = 0

    for node in G.nodes():
        k = G.degree(node)
        H[k] = H.get(k, 0) + 1

    return H

In [None]:
import matplotlib.pyplot as plt
from readlist import readlist

In [None]:
 imdb = readlist('imdb.pkl', 0)
 H_imdb = Hdeg(imdb)

 plt.hist(list((H_imdb.values())), bins=30)
 plt.xlabel('K')
 plt.ylabel('H(k)')
 plt.title('Degree histogram of imdb')
 plt.show()

In [None]:
 plt.bar(list((H_imdb.values())),list((H_imdb.keys())))
 plt.xlabel('K')
 plt.ylabel('H(k)')
 plt.title('Degree histogram of imdb')
 plt.show()

## 2. What about dots?
A histogram can also be plotted using dots. This is even more flexible in terms of the information you can show. However, be careful to use dots when you don't have much variance in the degrees.

For this problem, create the degree histogram for the `imdb` network, which you already have as a dictionary (from the previous problem). Let's include some additional details. Make sure you use the `plot` method in `matplotlib.pyplot`:

1. Add a title for the plot
2. Put labels into both axes
3. Use green dots
4. Make the dots bigger than the default style.
5. Plot the degree histogram again, this time using a logarithmic scale for both axes.

If you're not sure how to make all these changes, you can start by looking at the [documentation](https://matplotlib.org/stable/tutorials/introductory/pyplot.html)

In [None]:
 plt.scatter(list((H_imdb.keys())), list((H_imdb.values())), color = 'g', s = 50)
 plt.xlabel('K')
 plt.ylabel('H(k)')
 plt.title('Degree histogram of imdb')
 plt.show()

In [None]:
plt.scatter(list((H_imdb.keys())), list((H_imdb.values())), color = 'g', s = 50)
plt.xscale('log')
plt.yscale('log')
plt.xlabel('K')
plt.ylabel('H(k)')
plt.title('Degree histogram of imdb')
plt.show()

## 3. Plotting functions

It is possible to identify when a series of data requires a logarithmic scale by looking at the function used to generate it. Evaluate the function below in the interval $[35, 146]$ (only use integers).

$$f(x) = 2^{\frac{2x}{7}}$$

Plot the result and choose the right scale for both axes.

In [None]:
y = []
for x in range(35, 146+1):
  y.append(int(2**((2*x)/7)))

print(y)

In [None]:
 plt.plot([i for i in range(35, 146+1)],y, )
 plt.xlabel('x')
 plt.ylabel('f(x)')
 plt.yscale('log')
 plt.title('Plot of f(x)')
 plt.show()

## <font color='#006633'>4. Comparing Networks</font>

There are different quantities we can use to compare two networks. This time we will focus on their degree histograms.

Compare the Enron (ignore the first 4 rows) and Watergate networks. What are your conclusions just by looking at the plots?