# Project Data Analysis

This notebook contains the analysis of project data for the Reddit place experiment.

We have computed several distributions, scatter plots and visualizations to better understant the dataset.

To run this notebook, the following files should be precomputed in the '.../data/' folder:

1) sorted_tile_placements_proj.csv : Tile updates with project information added. Format: 

#ts,user,x_coordinate,y_coordinate,color,pic_id,pixel,pixel_color

2) sorted_tile_placements_proj_den_freq.csv : Denoised version of the previous file using Frequent Pixel heuristic.

3) sorted_tile_placements_proj_den_freq.csv : Denoised version of same file based on figure created by users.

4) atlas.json : Filtered version of the place atlas

# Sections:

## Single Variable
[Updates per Project](#Updates_per_Project)

[Updates Entropy per Project](#Updates_Entropy_per_Project)

[Updates Entropy (Time) per project](#Updates_Time_Entropy_per_Project)

[Colors and Entropy (Color) per Project (Original Data)](#Colors_and_Entropy_(Color)_per_Project_Original)

[Colors and Entropy (Color) per Project (Denoised Data using Frequent Color)](#Colors_and_Entropy_(Color)_per_Project_Denoised_Frequent)

[Colors and Entropy (Color) per Project (Denoised by Users)](#Colors_and_Entropy_(Color)_per_Project_Denoised_Users)

[Pixels per Project](#Pixels_per_Project)

[Projects per User](#Projects_per_User)

[Users per Project](#Users_per_Project)

[Time per Project](#Time_per_Project)

[Area per Project](#Area_per_Project)

## Two Variables

### Update
[Update vs Entropy](#Update_vs_Entropy)

[Update vs Pixel](#Update_vs_pixel)

[Update vs User](#Update_vs_User)

[Update vs Time](#Update_vs_Time)

[Update vs Area](#Update_vs_Area)

### Pixels

[Pixels vs Entropy](#Pixels_vs_Entropy)

[Pixels vs Users](#Pixels_vs_Users)

[Pixels vs Time](#Pixels_vs_Time)

[Pixels vs Area](#Pixels_vs_Area)

### Users

[Users vs Entropy](#Users_vs_Entropy)

[Users vs Time](#Users_vs_Time)

[Users vs Area](#Users_vs_Area)

### Time

[Time vs Entropy](#Time_vs_Entropy)

[Time vs Area](#Time_vs_Area)

### Area

[Area vs Entropy](#Area_vs_Entropy)

# Others

[Updates per Time](#Updates_per_time)

[Distance (Users)](#Distance_users)


In [1]:
import csv
import sys
import os
import math
import random
import numpy as np
import numpy.linalg as npla
import scipy
import sklearn
from scipy import sparse
from scipy import linalg
import scipy.sparse.linalg as spla
from scipy.spatial import distance
import matplotlib.pyplot as plt
from matplotlib import cm
import matplotlib.mlab as mlab
from mpl_toolkits.mplot3d import axes3d
import operator

In [2]:
import sys
sys.path.append("../Python_code") # go to parent dir
from reddit import *
from canvas_vis import *
from project_data_analysis import *
from generate_proj_to_remove import *

# Setting up Variables

In [6]:
#Run this before anything else!

input_file_proj = "../data/sorted_tile_placements_proj.csv"
input_file_proj_den_freq = "../data/tile_placements_denoised_freq_proj.csv"
input_file_proj_den_users = "../data/tile_placements_denoised_users_proj.csv"
js_filename = "../data/atlas.json"

#Projects to remove 777, 1921 (whole canvas), 1240, 1516 (1 pixel), 1319 (very incomplete)
#1169, 42 (repeated American Flag, 1122), 1066 (repeated blue corner 67), 
#1757 (repeated the far left side 736), 1824 (climber's head, too small)
#320 (repeated kenkistan/rainbow flat 3311)
#351 (repeated erase the place 1297)
#1046, 1073 (repeated channel orange 958)
#998, 1870 (repeated darth plagueis the wise 75)
#1383, 1493, 1823, 1818, 645, 1640 (Very small)
#1811, 1925, 1927, 704, 1085, 1308, 1378, 1412, 1418, 1428, 1455, 1482, 1512, 1548, 1589, 
#1614, 1790, 939, 1263, 1383, 1155, 1524, 129, 1595, 1254, 1528, 1529, 1578, 1616, 1721 (Covered)
# projects_to_remove = {'777', '1921', '1169', '42', '1066', '1757', '1824', '320', '998', '1870', '1811',\
#                      '1925', '1927', '704', '1085', '1308', '1378', '1412', '1418', '1428', '1455', '1482',\
#                       '1512', '1548', '1589', '1614', '1790', '1319', '939', '1263', '1383', '1155', '1761', 
#                      '1524', '351', '129', '1046', '1073', '1595', '1254', '1528', '1529', '1578', '1616',\
#                      '1721'}

projects_to_remove = get_list_of_removed_proj(output_filename = "../data/proj_to_remove.txt")

locations = store_locations(js_filename)

names, descriptions = read_picture_names_and_descriptions(js_filename)

<a id='Updates_per_Project'></a>
# Updates per Project

In [8]:
#Computing updates per project
updates_per_proj, total_updates = updates_per_project(input_file_proj, projects_to_remove)

#computing the updates in three different category: agree,disagree,final
tile_updates, total_tile_updates = update_category_per_project()

#print("total updates:", total_updates)

In [10]:
sorted_up_proj = sorted(updates_per_proj.items(), key=operator.itemgetter(1), reverse=True)
print(sorted_up_proj[:10])
#Top-10
for i in range(10):
    proj = sorted_up_proj[i][0]
    up = sorted_up_proj[i][1]
    name = names[int(proj)]
    desc = descriptions[int(proj)]
    
    print("#", i, ", project: ", name, ", updates: ", up, ", \ndesc: ", desc, "\n")

[('42', 612816), ('1169', 587958), ('95', 540508), ('736', 399649), ('1066', 362651), ('67', 358566), ('903', 353109), ('1757', 324259), ('1897', 309469), ('998', 293808)]


KeyError: 42

In [None]:
#Bottom-10

for i in range(1,11):
    proj = sorted_up_proj[-i][0]
    up = sorted_up_proj[-i][1]
    name = names[int(proj)]
    desc = descriptions[int(proj)]
    
    print("#", i, ", project: ", name, ", updates: ", up, ", \ndescription: ", desc, "\n")

In [None]:
#AVG

print("AVG = ", np.mean(np.array(list(updates_per_proj.values()))))

In [None]:
#STD

print("STD = ", np.std(np.array(list(updates_per_proj.values()))))

In [None]:
#Inverse cumulative density funcition

def plot_updates_per_project_icdf(count, output_file_name):
    plt.clf()
    ax = plt.subplot(111)
    ax.loglog(range(len(count)), count, color="red", linewidth=4)
    ax.set_ylabel('ICDF', fontsize=30)
    ax.set_xlabel('#updates', fontsize=30)
    ax.tick_params(labelsize=23)
    ax.set_xlim(1,10000000)
    plt.savefig(output_file_name, dpi=300, bbox_inches='tight')
    
#Computing ICDF
count = icdf(updates_per_proj)
    
plot_updates_per_project_icdf(count, "../plots/plot_updates_per_project_icdf.svg")

<a id='Updates_Entropy_per_Project'></a>
# Updates Entropy per Project

In [None]:
#Computing update-entropy per project: agreeing vs disagreeing
#tile_updates is computed in the previous block
update_entropy_per_proj=update_entropy_per_project(tile_updates)

In [None]:
sorted_up_ent_proj = sorted(update_entropy_per_proj.items(), key=operator.itemgetter(1), reverse=True)

#Top-10
for i in range(10):
    proj = sorted_up_ent_proj[i][0]
    up_ent = sorted_up_ent_proj[i][1]
    name = names[int(proj)]
    desc = descriptions[int(proj)]
    
    print("#", i, ", project: ", name, ", update_entropy: ", up_ent, ", \ndesc: ", desc, "\n")

In [None]:

#Bottom-10
for i in range(1,11):
    proj = sorted_up_ent_proj[-i][0]
    up_ent = sorted_up_ent_proj[-i][1]
    name = names[int(proj)]
    desc = descriptions[int(proj)]
    
    print("#", i, ", project: ", name, ", update_entropy: ", up_ent, ", \ndesc: ", desc, "\n")

In [None]:
#AVG

print("AVG = ", np.mean(np.array(list(update_entropy_per_proj.values()))))

In [None]:
#STD

print("STD = ", np.std(np.array(list(update_entropy_per_proj.values()))))

In [None]:
#Inverse cumulative density funcition
%matplotlib inline
def plot_update_entropies_per_project_icdf(entropy, count, output_file_name):
    plt.clf()

    ax = plt.subplot(111)
    ax.plot(entropy, count, color="red", linewidth=4)
    ax.set_ylabel('ICDF', fontsize=30)
    ax.set_xlabel('entropy', fontsize=30)
    ax.tick_params(labelsize=23)
    #ax.set_xticks(np.arange(0,math.log(.5)+.1, math.log(.1)))
    ax.set_yticks(np.arange(0, 1.1, .25))
    plt.savefig(output_file_name, dpi=300, bbox_inches='tight')

#Computing ICDF
entropy = sorted(list(update_entropy_per_proj.values()))

count = np.arange(len(entropy)+1)[len(entropy)+1:0:-1]
count = count / count[0]

plot_update_entropies_per_project_icdf(entropy, count, "../plots/plot_update_entropies_users_icdf.svg")


<a id='Updates_Time_Entropy_per_Project'></a>
# Updates Entropy (Time) per Project

In [None]:
#Computing update-entropy per project: agreeing vs disagreeing over all the time slots (around 78 --- based on hours)
#tile_updates is computed in the previous block
update_time_entropy_per_proj=update_time_entropy_per_project(tile_updates)

In [None]:
sorted_up_time_ent_proj = sorted(update_time_entropy_per_proj.items(), key=operator.itemgetter(1), reverse=True)

#Top-10
for i in range(10):
    proj = sorted_up_time_ent_proj[i][0]
    up_ent = sorted_up_time_ent_proj[i][1]
    name = names[int(proj)]
    desc = descriptions[int(proj)]
    
    print("#", i, ", project: ", name, ", update_time_entropy: ", up_ent, ", \ndesc: ", desc, "\n")

In [None]:
#Bottom-10
for i in range(1,11):
    proj = sorted_up_time_ent_proj[-i][0]
    up_ent = sorted_up_time_ent_proj[-i][1]
    name = names[int(proj)]
    desc = descriptions[int(proj)]
    
    print("#", i, ", project: ", name, ", update_time_entropy: ", up_ent, ", \ndesc: ", proj, "\n")

In [None]:
#AVG

print("AVG = ", np.mean(np.array(list(update_time_entropy_per_proj.values()))))

In [None]:
#STD

print("STD = ", np.std(np.array(list(update_time_entropy_per_proj.values()))))

In [None]:
#Inverse cumulative density funcition
%matplotlib inline
def plot_update_time_entropies_per_project_icdf(entropy, count, output_file_name):
    plt.clf()

    ax = plt.subplot(111)
    ax.plot(entropy, count, color="red", linewidth=4)
    ax.set_ylabel('ICDF', fontsize=30)
    ax.set_xlabel('entropy', fontsize=30)
    ax.tick_params(labelsize=23)
    #ax.set_xticks(np.arange(0,math.exp(1)+.6, .69))
    ax.set_xticks(np.arange(2,6, .5))
    ax.set_yticks(np.arange(0, 1.1, .25))
    plt.savefig(output_file_name, dpi=300, bbox_inches='tight')

#Computing ICDF
entropy = sorted(list(update_time_entropy_per_proj.values()))

count = np.arange(len(entropy)+1)[len(entropy)+1:0:-1]
count = count / count[0]

plot_update_time_entropies_per_project_icdf(entropy, count, "../plots/plot_update_time_entropies_users_icdf.svg")

<a id='Colors_and_Entropy_(Color)_per_Project_Original'></a>
# Colors and Entropy (Color) per Project (Original Data)

In [None]:
# Computing colors per project and entropies
#Only pixels (final) are considered.
colors_per_proj = colors_per_project(input_file_proj, projects_to_remove)
entropy_per_proj = entropy_per_project(colors_per_proj)

In [None]:
num_colors_per_proj = {}

for proj in colors_per_proj:
    num_colors_per_proj[proj] = np.count_nonzero(colors_per_proj[proj])

In [None]:
sorted_color_proj = sorted(num_colors_per_proj.items(), key=operator.itemgetter(1), reverse=True)

#Top-10
for i in range(10):
    proj = sorted_color_proj[i][0]
    colors = sorted_color_proj[i][1]
    name = names[int(proj)]
    desc = descriptions[int(proj)]
    
    print("#", i, ", project: ", name, ", colors: ", colors, ", \ndesc: ", desc, "\n")

In [None]:
#Bottom-10

for i in range(1,11):
    proj = sorted_color_proj[-i][0]
    colors = sorted_color_proj[-i][1]
    name = names[int(proj)]
    desc = descriptions[int(proj)]
    
    print("#", i, ", project: ", name, ", colors: ", colors, ", \ndesc: ", desc, "\n")

In [None]:
#AVG

print("AVG = ", np.mean(np.array(list(num_colors_per_proj.values()))))

In [None]:
#STD

print("STD = ", np.std(np.array(list(num_colors_per_proj.values()))))

In [None]:
%matplotlib inline
def plot_colors_per_project_hst(count, output_file_name):
    plt.clf()

    ax = plt.subplot(111)
    ax.bar(np.arange(count.shape[0]), count, color="black", linewidth=4)
    ax.set_ylabel('#projects', fontsize=30)
    ax.set_xlabel('colors', fontsize=30)
    ax.tick_params(labelsize=23)
    ax.set_xlim(0,)
    ax.set_xticks(np.arange(1,17, 3))
    ax.set_yticks(np.arange(0, 300, 50))
    plt.savefig(output_file_name, dpi=300, bbox_inches='tight')

count = np.zeros(17)

for c in num_colors_per_proj.values():
    count[c] = count[c] + 1
    
plot_colors_per_project_hst(count, "../plots/plot_colors_project_hist.svg")

In [None]:
sorted_ent_proj = sorted(entropy_per_proj.items(), key=operator.itemgetter(1), reverse=True)

#Top-10
for i in range(10):
    proj = sorted_ent_proj[i][0]
    ent = sorted_ent_proj[i][1]
    name = names[int(proj)]
    desc = descriptions[int(proj)]
    
    print("#", i, ", project: ", name, ", entropy: ", ent, ", \ndesc: ", desc, "\n")

In [None]:
#Bottom-10

for i in range(1,11):
    proj = sorted_ent_proj[-i][0]
    ent = sorted_ent_proj[-i][1]
    name = names[int(proj)]
    desc = descriptions[int(proj)]
    
    print("#", i, ", project: ", name, ", entropy: ", ent, ", \ndesc: ", desc, "\n")

In [None]:
#AVG

print("AVG = ", np.mean(np.array(list(entropy_per_proj.values()))))

In [None]:
#STD

print("STD = ", np.std(np.array(list(entropy_per_proj.values()))))

In [None]:
%matplotlib inline
def plot_color_entropies_per_project_icdf(entropy, count, output_file_name):
    plt.clf()

    ax = plt.subplot(111)
    ax.plot(entropy, count, color="red", linewidth=4)
    ax.set_ylabel('ICDF', fontsize=30)
    ax.set_xlabel('entropy', fontsize=30)
    ax.tick_params(labelsize=23)
    ax.set_xticks(np.arange(0,math.exp(1)+.1, .69))
    ax.set_yticks(np.arange(0, 1.1, .25))
    plt.savefig(output_file_name, dpi=300, bbox_inches='tight')

#Computing ICDF
entropy = sorted(list(entropy_per_proj.values()))

count = np.arange(len(entropy)+1)[len(entropy)+1:0:-1]
count = count / count[0]

plot_color_entropies_per_project_icdf(entropy, count, "../plots/plot_color_entropies_icdf.svg")

<a id='Colors_and_Entrpy_(Color)_per_Project_Denoised_Frequent'></a>
# Colors and Entropy (Color) per Project (Denoised Data using Frequent Color)

In [None]:
# Computing colors per project and entropies
#Only pixels (final) are considered.

colors_per_proj_den_freq = colors_per_project(input_file_proj_den_freq, projects_to_remove)
entropy_per_proj_den_freq = entropy_per_project(colors_per_proj_den_freq)

In [None]:
num_colors_per_proj_den_freq = {}

for proj in colors_per_proj_den_freq:
    num_colors_per_proj_den_freq[proj] = np.count_nonzero(colors_per_proj_den_freq[proj])

In [None]:
sorted_color_proj = sorted(num_colors_per_proj_den_freq.items(), key=operator.itemgetter(1), reverse=True)

#Top-10
for i in range(10):
    proj = sorted_color_proj[i][0]
    colors = sorted_color_proj[i][1]
    name = names[int(proj)]
    desc = descriptions[int(proj)]
    
    print("#", i, ", project: ", name, ", colors: ", colors, ", \ndesc: ", desc, "\n")

In [None]:
#Bottom-10

for i in range(1,11):
    proj = sorted_color_proj[-i][0]
    colors = sorted_color_proj[-i][1]
    name = names[int(proj)]
    desc = descriptions[int(proj)]
    
    print("#", i, ", project: ", name, ", colors: ", colors, ", \ndesc: ", desc, "\n")

In [None]:
#AVG

print("AVG = ", np.mean(np.array(list(num_colors_per_proj_den_freq.values()))))

In [None]:
#STD

print("STD = ", np.std(np.array(list(num_colors_per_proj_den_freq.values()))))

In [None]:
%matplotlib inline
def plot_colors_per_project_hst(count, output_file_name):
    plt.clf()

    ax = plt.subplot(111)
    ax.bar(np.arange(count.shape[0]), count, color="black", linewidth=4)
    ax.set_ylabel('#projects', fontsize=30)
    ax.set_xlabel('colors', fontsize=30)
    ax.tick_params(labelsize=23)
    ax.set_xlim(0,)
    ax.set_xticks(np.arange(1,17, 3))
    ax.set_yticks(np.arange(0, 300, 50))
    plt.savefig(output_file_name, dpi=300, bbox_inches='tight')

count = np.zeros(17)

for c in num_colors_per_proj_den_freq.values():
    count[c] = count[c] + 1
    
plot_colors_per_project_hst(count, "../plots/plot_colors_project_den_freq_hist.svg")

In [None]:
sorted_ent_proj = sorted(entropy_per_proj_den_freq.items(), key=operator.itemgetter(1), reverse=True)

#Top-10
for i in range(10):
    proj = sorted_ent_proj[i][0]
    ent = sorted_ent_proj[i][1]
    name = names[int(proj)]
    desc = descriptions[int(proj)]
    
    print("#", i, ", project: ", name, ", entropy: ", ent, ", \ndesc: ", desc, "\n")

In [None]:
#Bottom-10

for i in range(1,11):
    proj = sorted_ent_proj[-i][0]
    ent = sorted_ent_proj[-i][1]
    name = names[int(proj)]
    desc = descriptions[int(proj)]
    
    print("#", i, ", project: ", name, ", entropy: ", ent, ", \ndesc: ", desc, "\n")

In [None]:
#AVG

print("AVG = ", np.mean(np.array(list(entropy_per_proj_den_freq.values()))))

In [None]:
#STD

print("STD = ", np.std(np.array(list(entropy_per_proj_den_freq.values()))))

In [None]:
%matplotlib inline
def plot_color_entropies_per_project_icdf(entropy, count, output_file_name):
    plt.clf()

    ax = plt.subplot(111)
    ax.plot(entropy, count, color="red", linewidth=4)
    ax.set_ylabel('ICDF', fontsize=30)
    ax.set_xlabel('entropy', fontsize=30)
    ax.tick_params(labelsize=23)
    ax.set_xticks(np.arange(0,math.exp(1)+.1, .69))
    ax.set_yticks(np.arange(0, 1.1, .25))
    plt.savefig(output_file_name, dpi=300, bbox_inches='tight')

#Computing ICDF
entropy = sorted(list(entropy_per_proj_den_freq.values()))

count = np.arange(len(entropy)+1)[len(entropy)+1:0:-1]
count = count / count[0]

plot_color_entropies_per_project_icdf(entropy, count, "../plots/plot_color_entropies_den_freq_icdf.svg")

<a id='Colors_and_Entropy_(Color)_per_Project_Denoised_Users'></a>
# Colors and Entropy (Color) per Project (Denoised Data using Canvas Denoised by Users)

In [None]:
# Computing colors per project and entropies
#Only pixels (final) are considered.
colors_per_proj_den_users = colors_per_project(input_file_proj_den_users, projects_to_remove)
entropy_per_proj_den_users = entropy_per_project(colors_per_proj_den_users)

In [None]:
num_colors_per_proj_den_users = {}

for proj in colors_per_proj_den_users:
    num_colors_per_proj_den_users[proj] = np.count_nonzero(colors_per_proj_den_users[proj])

In [None]:
sorted_color_proj = sorted(num_colors_per_proj_den_users.items(), key=operator.itemgetter(1), reverse=True)

#Top-10
for i in range(10):
    proj = sorted_color_proj[i][0]
    colors = sorted_color_proj[i][1]
    name = names[int(proj)]
    desc = descriptions[int(proj)]
    
    print("#", i, ", project: ", name, ", colors: ", colors, ", \ndesc: ", desc, "\n")

In [None]:
#Bottom-10

for i in range(1,11):
    proj = sorted_color_proj[-i][0]
    colors = sorted_color_proj[-i][1]
    name = names[int(proj)]
    desc = descriptions[int(proj)]
    
    print("#", i, ", project: ", name, ", colors: ", colors, ", \ndesc: ", desc, "\n")

In [None]:
#AVG

print("AVG = ", np.mean(np.array(list(num_colors_per_proj_den_users.values()))))

In [None]:
#STD

print("STD = ", np.std(np.array(list(num_colors_per_proj_den_users.values()))))

In [None]:
%matplotlib inline
def plot_colors_per_project_hst(count, output_file_name):
    plt.clf()

    ax = plt.subplot(111)
    ax.bar(np.arange(count.shape[0]), count, color="black", linewidth=4)
    ax.set_ylabel('#projects', fontsize=30)
    ax.set_xlabel('colors', fontsize=30)
    ax.tick_params(labelsize=23)
    ax.set_xlim(0,)
    ax.set_xticks(np.arange(1,17, 3))
    ax.set_yticks(np.arange(0, 300, 50))
    plt.savefig(output_file_name, dpi=300, bbox_inches='tight')

count = np.zeros(17)

for c in num_colors_per_proj_den_users.values():
    count[c] = count[c] + 1
    
plot_colors_per_project_hst(count, "../plots/plot_colors_project_den_users_hist.svg")

In [None]:
sorted_ent_proj = sorted(entropy_per_proj_den_users.items(), key=operator.itemgetter(1), reverse=True)

#Top-10
for i in range(10):
    proj = sorted_ent_proj[i][0]
    ent = sorted_ent_proj[i][1]
    name = names[int(proj)]
    desc = descriptions[int(proj)]
    
    print("#", i, ", project: ", name, ", entropy: ", ent, ", \ndesc: ", desc, "\n")

In [None]:
#Bottom-10

for i in range(1,11):
    proj = sorted_ent_proj[-i][0]
    ent = sorted_ent_proj[-i][1]
    name = names[int(proj)]
    desc = descriptions[int(proj)]
    
    print("#", i, ", project: ", name, ", entropy: ", ent, ", \ndesc: ", desc, "\n")

In [None]:
#AVG

print("AVG = ", np.mean(np.array(list(entropy_per_proj_den_users.values()))))

In [None]:
#STD

print("STD = ", np.std(np.array(list(entropy_per_proj_den_users.values()))))

In [None]:
%matplotlib inline
def plot_color_entropies_per_project_icdf(entropy, count, output_file_name):
    plt.clf()

    ax = plt.subplot(111)
    ax.plot(entropy, count, color="red", linewidth=4)
    ax.set_ylabel('ICDF', fontsize=30)
    ax.set_xlabel('entropy', fontsize=30)
    ax.tick_params(labelsize=23)
    ax.set_xticks(np.arange(0,math.exp(1)+.1, .69))
    ax.set_yticks(np.arange(0, 1.1, .25))
    plt.savefig(output_file_name, dpi=300, bbox_inches='tight')

#Computing ICDF
entropy = sorted(list(entropy_per_proj_den_users.values()))

count = np.arange(len(entropy)+1)[len(entropy)+1:0:-1]
count = count / count[0]

plot_color_entropies_per_project_icdf(entropy, count, "../plots/plot_color_entropies_den_users_icdf.svg")

<a id='Pixels_per_Project'></a>
# Pixels per Project

In [None]:
# Computing number of pixels per project
#Only pixels (final) are considered.
pixels_per_proj = pixels_per_project(input_file_proj, projects_to_remove)

In [None]:
sorted_pixel_proj = sorted(pixels_per_proj.items(), key=operator.itemgetter(1), reverse=True)

#Top-10
for i in range(10):
    proj = sorted_pixel_proj[i][0]
    pix = sorted_pixel_proj[i][1]
    name = names[int(proj)]
    desc = descriptions[int(proj)]
    
    print("#", i, ", project: ", name, ", pixels: ", pix, ", \ndesc: ", desc, "\n")

In [None]:
#Bottom-10

for i in range(1,11):
    proj = sorted_pixel_proj[-i][0]
    pix = sorted_pixel_proj[-i][1]
    name = names[int(proj)]
    desc = descriptions[int(proj)]
    
    print("#", i, ", project: ", name, ", pixels: ", pix, ", \ndesc: ", desc, "\n")

In [None]:
#AVG

print("AVG = ", np.mean(np.array(list(pixels_per_proj.values()))))

In [None]:
#STD

print("STD = ", np.std(np.array(list(pixels_per_proj.values()))))

In [None]:
%matplotlib inline
def plot_pixels_per_project_icdf(count, output_file_name):
    plt.clf()

    ax = plt.subplot(111)
    ax.loglog(np.arange(count.shape[0]), count, color="red", linewidth=4)
    ax.set_ylabel('ICDF', fontsize=30)
    ax.set_xlabel('#pixels', fontsize=30)
    ax.tick_params(labelsize=23)
    ax.set_xlim(1,100000)
    plt.savefig(output_file_name, dpi=300, bbox_inches='tight')

#Computing ICDF
count = icdf(pixels_per_proj)
    
plot_pixels_per_project_icdf(count, "../plots/plot_pixels_icdf.svg")

<a id='Projects_per_User'></a>
# Projects per User

In [None]:
#Projects per User
#Only updates that agree with the final color of the tile are considered
proj_per_user = projects_per_user(input_file_proj, projects_to_remove)

In [None]:
sorted_proj_per_user = sorted(proj_per_user.items(), key=operator.itemgetter(1), reverse=True)

#Top-10
for i in range(10):
    user = sorted_proj_per_user[i][0]
    n = sorted_proj_per_user[i][1]
    
    print("#", i, ", user: ", user, ", projects: ", n)

In [None]:
#Bottom-10

for i in range(1,11):
    user = sorted_proj_per_user[-i][0]
    n = sorted_proj_per_user[-i][1]
    
    print("#", i, ", user: ", user, ", projects: ", n)

In [None]:
#AVG

print("AVG = ", np.mean(np.array(list(proj_per_user.values()))))

In [None]:
#STD

print("STD = ", np.std(np.array(list(proj_per_user.values()))))

In [None]:
#Median

print("Median = ", np.median(np.array(list(proj_per_user.values()))))

In [None]:
%matplotlib inline
def plot_projects_per_user_icdf(count, output_file_name):
    plt.clf()

    ax = plt.subplot(111)
    ax.loglog(np.arange(count.shape[0]), count, color="red", linewidth=4)
    ax.set_ylabel('ICDF', fontsize=30)
    ax.set_xlabel('#projects', fontsize=30)
    ax.tick_params(labelsize=23)
    #ax.set_yticks(np.arange(0,1.1,0.25))
    plt.savefig(output_file_name, dpi=300, bbox_inches='tight')
    
#Computing ICDF
count = icdf(proj_per_user)

plot_projects_per_user_icdf(count, "../plots/plot_proj_user_icdf.svg")

<a id='Users_per_Project'></a>

# Users per Project

In [None]:
# Users per Project
#Only updates that aggree with the final color of the tile are considered
users_per_proj = users_per_project(input_file_proj, projects_to_remove)   

In [None]:
sorted_users_per_proj = sorted(users_per_proj.items(), key=operator.itemgetter(1), reverse=True)

#Top-10
for i in range(10):
    proj = sorted_users_per_proj[i][0]
    n = sorted_users_per_proj[i][1]
    name = names[int(proj)]
    desc = descriptions[int(proj)]
    
    print("#", i, ", project: ", name, ", users: ", n, ", \ndesc: ", desc, "\n")

In [None]:
#Bottom-10

for i in range(1,11):
    proj = sorted_users_per_proj[-i][0]
    n = sorted_users_per_proj[-i][1]
    name = names[int(proj)]
    desc = descriptions[int(proj)]
    
    print("#", i, ", project: ", name, ", users: ", n, ", \ndesc: ", desc, "\n")

In [None]:
#AVG

print("AVG = ", np.mean(np.array(list(users_per_proj.values()))))

In [None]:
#STD

print("STD = ", np.std(np.array(list(users_per_proj.values()))))

In [None]:
#Median
print("Median = ", np.median(np.array(list(users_per_proj.values()))))

In [None]:
%matplotlib inline
def plot_users_per_project_icdf(count, output_file_name):
    plt.clf()

    ax = plt.subplot(111)
    ax.loglog(np.arange(count.shape[0]), count, color="red", linewidth=4)
    ax.set_ylabel('ICDF', fontsize=30)
    ax.set_xlabel('#users', fontsize=30)
    ax.tick_params(labelsize=23)
    ax.set_xlim(1, 1000000)
    plt.savefig(output_file_name, dpi=300, bbox_inches='tight')
    
#Computing ICDF
count = icdf(users_per_proj)

plot_users_per_project_icdf(count, "../plots/plot_user_proj_icdf.svg")

<a id='Time_per_Project'></a>
# Time per Project

In [None]:
#Time per project
#Only updates that aggree with the final color of the tile are considered
times_per_proj = times_per_project(input_file_proj, projects_to_remove)

In [None]:
sorted_times_per_proj = sorted(times_per_proj.items(), key=operator.itemgetter(1), reverse=True)

#Top-10
for i in range(10):
    proj = sorted_times_per_proj[i][0]
    t = sorted_times_per_proj[i][1] / (1000 * 60 * 60)
    name = names[int(proj)]
    desc = descriptions[int(proj)]
    
    print("#", i, ", project: ", name, ", time: ", t, ", \ndesc: ", desc, "\n")

In [None]:
#Bottom-10

for i in range(1,11):
    proj = sorted_times_per_proj[-i][0]
    t = sorted_times_per_proj[-i][1] / (1000 * 60 * 60)
    name = names[int(proj)]
    desc = descriptions[int(proj)]
    
    print("#", i, ", project: ", name, ", time: ", t, ", \ndesc: ", desc, "\n")

In [None]:
#AVG (hours)

print("AVG = ", np.mean(np.array(list(times_per_proj.values())))/(1000 * 60 * 60))

In [None]:
#STD (hours)

print("STD = ", np.std(np.array(list(times_per_proj.values())))/(1000 * 60 * 60))

In [None]:
%matplotlib inline
def plot_times_per_project_icdf(count, output_file_name):
    plt.clf()

    ax = plt.subplot(111)
    ax.semilogy(np.arange(count.shape[0]), count, color="red", linewidth=4) #millisecons to hours
    ax.set_ylabel('ICDF', fontsize=30)
    ax.set_xlabel('time (hours)', fontsize=30)
    ax.tick_params(labelsize=23)
    #ax.set_xticks(np.arange(0,101,25))
    #ax.set_xlim(None, 100)
    #ax.set_yticks(np.arange(0,1.1,0.25))
    plt.savefig(output_file_name, dpi=300, bbox_inches='tight')
    
#Computing ICDF
times_per_proj_hours = {}

for proj in times_per_proj:
    times_per_proj_hours[proj] = int(times_per_proj[proj] / (1000 * 60 * 60))

count = icdf(times_per_proj_hours)

plot_times_per_project_icdf(count, "../plots/plot_time_proj_icdf.svg")

<a id='Area_per_Project'></a>
# Area per Project

In [None]:
#Area per project (area is between 0 to 1)
#input_file_proj= "../data/sorted_tile_placements_proj.csv"
area_per_proj = area_per_project(input_file_proj,projects_to_remove)

In [None]:
sorted_area_per_proj = sorted(area_per_proj.items(), key=operator.itemgetter(1), reverse=True)

#Top-10
for i in range(10):
    proj = sorted_area_per_proj[i][0]
    a = sorted_area_per_proj[i][1]
    name = names[int(proj)]
    desc = descriptions[int(proj)]
    
    print("#", i, ", project: ", name, ", area: ", a, ", \ndesc: ", desc, "\n")

In [None]:

#Bottom-10
for i in range(1,11):
    proj = sorted_area_per_proj[-i][0]
    a = sorted_area_per_proj[-i][1] 
    name = names[int(proj)]
    desc = descriptions[int(proj)]
    
    print("#", i, ", project: ", name, ", area: ", a, ", \ndesc: ", desc, "\n")

In [None]:
#AVG 

print("AVG = ", np.mean(np.array(list(area_per_proj.values()))))

In [None]:
#STD 

print("STD = ", np.std(np.array(list(area_per_proj.values())))/(1000 * 60 * 60))

In [None]:
%matplotlib inline
def plot_area_per_project_icdf(area, count, output_file_name):
    plt.clf()

    ax = plt.subplot(111)
    ax.plot(area, count, color="red", linewidth=4)
    ax.set_ylabel('ICDF', fontsize=30)
    ax.set_xlabel('Area', fontsize=30)
    ax.tick_params(labelsize=23)
    #ax.set_xticks(np.arange(0,101,25))
    #ax.set_xlim(None, 100)
    #ax.set_yticks(np.arange(0,1.1,0.25))
    plt.savefig(output_file_name, dpi=300, bbox_inches='tight')
    
    
#Computing ICDF
area = sorted(list(area_per_proj.values()))
count = np.arange(len(area)+1)[len(area)+1:0:-1]
count = count / count[0]


plot_area_per_project_icdf(area,count, "../plots/plot_area_proj_icdf.svg")

# Two Variables

<a id='Update_vs_Entropy'></a>
# Update vs Entropy

In [None]:
#Updates vs. entropy 
X,Y = Create_Array(updates_per_proj,entropy_per_proj)

#Updates vs.entropy: ratio and ID
ratios, IDs = Ratio(updates_per_proj,entropy_per_proj, names)


In [None]:
sorted_ratios = sorted(ratios.items(), key=operator.itemgetter(1), reverse=True)

In [None]:
#Correlation
np.corrcoef(X, Y)

In [None]:
#Top-10

for i in range(10):
    proj = sorted_ratios[i][0]
    r = sorted_ratios[i][1]
        
    name = names[int(proj)]
    desc = descriptions[int(proj)]
   
    ent = entropy_per_proj[proj]
    up = updates_per_proj[proj]
    
    print("#", i,", project: ", name, ", entropy: ", ent, ", updates: ", up, ", entropy/update: ", r, "\
        , \ndescription: ", desc, "\n")

In [None]:
#Bottom-10    
for i in range(1, 11):
    proj = sorted_ratios[-i][0]
    r = sorted_ratios[-i][1]
        
    name = names[int(proj)]
    desc = descriptions[int(proj)]
   
    ent = entropy_per_proj[proj]
    up = updates_per_proj[proj]
    
    print("#", i, ", project: ", name, ", entropy: ", ent, ", updates: ", up, ", entropy/update: ", r, "\
        , \ndescription: ", desc, "\n")

In [None]:
%matplotlib inline

def plot_updates_vs_entropy(X, Y, output_file_name):
    plt.clf()
    fig, ax = plt.subplots()
    ax.scatter(X, Y, color="green", marker='x', s=5)
    ax.set_ylabel('entropy', fontsize=30)
    ax.tick_params(labelsize=23)
    ax.set_xlabel('#updates', fontsize=30)
    ax.set_xscale('log')
    #ax.set_yticks(np.arange(0.69,math.exp(1)+.1, .69))
    ax.set_xlim(1,None)
    
    plt.savefig(output_file_name, dpi=300, bbox_inches='tight')

plot_updates_vs_entropy(X, Y, "../plots/plot_updates_vs_entropy.svg")

In [None]:
#Showing project IDs on mouse hovering. Based on:
#https://stackoverflow.com/questions/7908636/possible-to-make-labels-appear-when-hovering-over-a-point-in-matplotlib

%matplotlib tk
cmap = plt.cm.RdYlGn
norm = plt.Normalize(1,4)

fig, ax = plt.subplots()
sc = ax.scatter(X, Y, color="green", marker='x', s=5)
ax.set_ylabel('entropy', fontsize=30)
ax.tick_params(labelsize=23)
ax.set_xlabel('#updates', fontsize=30)
ax.set_xscale('log')
ax.set_yticks(np.arange(np.min(Y),math.exp(1)+.1, .69))
    
annot = ax.annotate("", xy=(0,0), xytext=(5,5),textcoords="offset points", size=14)
annot.set_visible(False)   

def update_annot(ind):
    pos = sc.get_offsets()[ind["ind"][0]]
    annot.xy = pos
    text = "{}".format("".join(IDs[ind["ind"]]))
    annot.set_text(text)
    
def hover(event):
    vis = annot.get_visible()
    if event.inaxes == ax:
        cont, ind = sc.contains(event)
        if cont:
            update_annot(ind)
            annot.set_visible(True)
            fig.canvas.draw_idle()
        else:
            if vis:
                annot.set_visible(False)
                fig.canvas.draw_idle()
    
fig.canvas.mpl_connect("motion_notify_event", hover)

<a id='Update_vs_Pixels'></a>
# Update vs Pixel

In [None]:
#Updates vs. Pixels 
X,Y = Create_Array(updates_per_proj,pixels_per_proj)


#Updates vs. pixels: ratio and ID
ratios, IDs = Ratio(updates_per_proj,pixels_per_proj, names)

In [None]:
sorted_ratios = sorted(ratios.items(), key=operator.itemgetter(1), reverse=True)

In [None]:
#Correlation
np.corrcoef(X, Y)

In [None]:
#Top-10

for i in range(10):
    proj = sorted_ratios[i][0]
    r = sorted_ratios[i][1]
        
    name = names[int(proj)]
    desc = descriptions[int(proj)]
   
    pix = pixels_per_proj[proj]
    up = updates_per_proj[proj]
    
    print("#", i,", project: ", name, ", pixels: ", pix, ", updates: ", up, ", pixels/update: ", r, "\
        , \ndescription: ", desc, "\n")

In [None]:
#Bottom-10    
for i in range(1, 11):
    proj = sorted_ratios[-i][0]
    r = sorted_ratios[-i][1]
        
    name = names[int(proj)]
    desc = descriptions[int(proj)]
   
    pix = pixels_per_proj[proj]
    up = updates_per_proj[proj]
    
    print("#", i,", project: ", name, ", pixels: ", pix, ", updates: ", up, ", pixels/update: ", r, "\
        , \ndescription: ", desc, "\n")

In [None]:
%matplotlib inline

def plot_updates_vs_pixels(X, Y, output_file_name):
    plt.clf()
    fig, ax = plt.subplots()
    ax.scatter(X, Y, color="green", marker='x', s=5)
    ax.set_ylabel('#pixels', fontsize=30)
    ax.tick_params(labelsize=23)
    ax.set_xlabel('#updates', fontsize=30)
    ax.set_xscale('log')
    ax.set_yscale('log')
    ax.set_ylim(1,1000000)
    ax.set_xlim(1,None)
    plt.savefig(output_file_name, dpi=300, bbox_inches='tight')

plot_updates_vs_pixels(X, Y, "../plots/plot_updates_vs_pixels.svg")

In [None]:
#Showing project IDs on mouse hovering. Based on:
#https://stackoverflow.com/questions/7908636/possible-to-make-labels-appear-when-hovering-over-a-point-in-matplotlib

%matplotlib tk

cmap = plt.cm.RdYlGn
norm = plt.Normalize(1,4)

fig, ax = plt.subplots()
sc = ax.scatter(X, Y, color="green", marker='x', s=5)
ax.scatter(X, Y, color="green", marker='x', s=5)
ax.set_ylabel('#pixels', fontsize=30)
ax.tick_params(labelsize=23)
ax.set_xlabel('#updates', fontsize=30)
ax.set_xscale('log')
ax.set_yscale('log')
ax.set_ylim(1,1000000)
ax.set_xlim(1,None)
    
annot = ax.annotate("", xy=(0,0), xytext=(5,5),textcoords="offset points", size=14)
annot.set_visible(False)   

def update_annot(ind):
    pos = sc.get_offsets()[ind["ind"][0]]
    annot.xy = pos
    text = "{}".format("".join(IDs[ind["ind"]]))
    annot.set_text(text)
    
def hover(event):
    vis = annot.get_visible()
    if event.inaxes == ax:
        cont, ind = sc.contains(event)
        if cont:
            update_annot(ind)
            annot.set_visible(True)
            fig.canvas.draw_idle()
        else:
            if vis:
                annot.set_visible(False)
                fig.canvas.draw_idle()
    
fig.canvas.mpl_connect("motion_notify_event", hover)

<a id='Update_vs_User'></a>
# Update vs User

In [None]:
#Updates vs. users per project
X,Y = Create_Array(updates_per_proj, users_per_proj)

#Updates vs. users: ratio and ID
ratios, IDs = Ratio(updates_per_proj,users_per_proj, names)

In [None]:
sorted_ratios = sorted(ratios.items(), key=operator.itemgetter(1), reverse=True)

In [None]:
#Correlation
np.corrcoef(X, Y)

In [None]:
#Top-10

for i in range(10):
    proj = sorted_ratios[i][0]
    r = sorted_ratios[i][1]
        
    name = names[int(proj)]
    desc = descriptions[int(proj)]
   
    usr = users_per_proj[proj]
    up = updates_per_proj[proj]
    
    print("#", i, ", project: ", name, ", users: ", usr, ", updates: ", up, ", users/update: ", r, "\
        , \ndescription: ", desc, "\n")

In [None]:
#Bottom-10    
for i in range(1, 11):
    proj = sorted_ratios[-i][0]
    r = sorted_ratios[-i][1]
        
    name = names[int(proj)]
    desc = descriptions[int(proj)]
   
    usr = users_per_proj[proj]
    up = updates_per_proj[proj]
    
    print("#", i, ", project: ", name, ", users: ", usr, ", updates: ", up, ", users/update: ", r, "\
        , \ndescription: ", desc, "\n")

In [None]:
%matplotlib inline

def plot_updates_vs_pixels(X, Y, output_file_name):
    plt.clf()
    fig, ax = plt.subplots()
    ax.scatter(X, Y, color="green", marker='x', s=5)
    ax.set_ylabel('#pixels', fontsize=30)
    ax.tick_params(labelsize=23)
    ax.set_xlabel('#updates', fontsize=30)
    ax.set_xscale('log')
    ax.set_yscale('log')
    ax.set_ylim(1,1000000)
    ax.set_xlim(1,None)
    plt.savefig(output_file_name, dpi=300, bbox_inches='tight')

plot_updates_vs_pixels(X, Y, "../plots/plot_updates_vs_users.svg")

In [None]:
%matplotlib tk

cmap = plt.cm.RdYlGn
norm = plt.Normalize(1,4)

fig, ax = plt.subplots()
sc = ax.scatter(X, Y, color="green", marker='x', s=5)
ax.scatter(X, Y, color="green", marker='x', s=5)
ax.set_ylabel('#users', fontsize=30)
ax.tick_params(labelsize=23)
ax.set_xlabel('#updates', fontsize=30)
ax.set_xscale('log')
ax.set_yscale('log')
ax.set_ylim(1,1000000)
ax.set_xlim(1,None)
    
annot = ax.annotate("", xy=(0,0), xytext=(5,5),textcoords="offset points", size=14)
annot.set_visible(False)   

def update_annot(ind):
    pos = sc.get_offsets()[ind["ind"][0]]
    annot.xy = pos
    text = "{}".format("".join(IDs[ind["ind"]]))
    annot.set_text(text)
    
def hover(event):
    vis = annot.get_visible()
    if event.inaxes == ax:
        cont, ind = sc.contains(event)
        if cont:
            update_annot(ind)
            annot.set_visible(True)
            fig.canvas.draw_idle()
        else:
            if vis:
                annot.set_visible(False)
                fig.canvas.draw_idle()
    
fig.canvas.mpl_connect("motion_notify_event", hover)

<a id='Update_vs_Time'></a>
# Update vs Time

In [None]:

#Updates vs. time per project
X,Y = Create_Array(updates_per_proj, times_per_proj)

#Updates vs. time: ratio and ID
ratios, IDs = Ratio(updates_per_proj,times_per_proj, names)

In [None]:
sorted_ratios = sorted(ratios.items(), key=operator.itemgetter(1), reverse=True)

In [None]:
#Correlation

np.corrcoef(X, Y)

In [None]:
#Top-10

for i in range(10):
    proj = sorted_ratios[i][0]
    r = sorted_ratios[i][1]/ (1000 * 60 * 60)
        
    name = names[int(proj)]
    desc = descriptions[int(proj)]
   
    tm = times_per_proj[proj] / (1000 * 60 * 60)
    up = updates_per_proj[proj]
    
    print("#", i, ", project: ", name, ", time: ", tm, ", updates: ", up, ", times/update: ", r, "\
        , \ndescription: ", desc, "\n") 
    

In [None]:
#Bottom-10    
for i in range(1, 11):
    proj = sorted_ratios[-i][0]
    r = sorted_ratios[-i][1]/ (1000 * 60 * 60)
        
    name = names[int(proj)]
    desc = descriptions[int(proj)]
   
    tm = times_per_proj[proj] / (1000 * 60 * 60)
    up = updates_per_proj[proj]
    
    print("#", i, ", project: ", name, ", time: ", tm, ", updates: ", up, ", times/update: ", r, "\
        , \ndescription: ", desc, "\n") 
    

In [None]:
%matplotlib inline
Y=Y/ (1000 * 60 * 60)
def plot_updates_vs_times(X, Y, output_file_name):
    plt.clf()
    ax = plt.subplot()
    ax.scatter(X, Y, color="green", marker='x', s=5)
    ax.set_ylabel('time (hours)', fontsize=30)
    ax.tick_params(labelsize=23)
    ax.set_xlabel('#updates', fontsize=30)
    ax.set_xscale('log')
    ax.set_xlim(1,None)
    ax.set_ylim(0,100)
    plt.savefig(output_file_name, dpi=300, bbox_inches='tight')

plot_updates_vs_times(X, Y, "../plots/plot_updates_vs_times.svg")

In [None]:
%matplotlib tk

cmap = plt.cm.RdYlGn
norm = plt.Normalize(1,4)

fig, ax = plt.subplots()
sc = ax.scatter(X, Y, color="green", marker='x', s=5)
ax.scatter(X, Y, color="green", marker='x', s=5)
ax.scatter(X, Y, color="green", marker='x', s=5)
ax.set_ylabel('time (hours)', fontsize=30)
ax.tick_params(labelsize=23)
ax.set_xlabel('#updates', fontsize=30)
ax.set_xscale('log')
ax.set_xlim(1,None)
ax.set_ylim(0,100)
    
annot = ax.annotate("", xy=(0,0), xytext=(5,5),textcoords="offset points", size=14)
annot.set_visible(False)   

def update_annot(ind):
    pos = sc.get_offsets()[ind["ind"][0]]
    annot.xy = pos
    text = "{}".format("".join(IDs[ind["ind"]]))
    annot.set_text(text)
    
def hover(event):
    vis = annot.get_visible()
    if event.inaxes == ax:
        cont, ind = sc.contains(event)
        if cont:
            update_annot(ind)
            annot.set_visible(True)
            fig.canvas.draw_idle()
        else:
            if vis:
                annot.set_visible(False)
                fig.canvas.draw_idle()
    
fig.canvas.mpl_connect("motion_notify_event", hover)

<a id='Update_vs_Area'></a>
# Update vs Area

In [None]:
#Updates vs. area per project
X,Y = Create_Array(updates_per_proj, area_per_proj)

#Updates vs. time: ratio and ID
ratios, IDs = Ratio(updates_per_proj,area_per_proj, names)

In [None]:
sorted_ratios = sorted(ratios.items(), key=operator.itemgetter(1), reverse=True)

In [None]:
#Correlation

np.corrcoef(X, Y)

In [None]:
#Top-10

for i in range(10):
    proj = sorted_ratios[i][0]
    r = sorted_ratios[i][1]
        
    name = names[int(proj)]
    desc = descriptions[int(proj)]
   
    ar = area_per_proj[proj] 
    up = updates_per_proj[proj]
    
    print("#", i, ", project: ", name, ", area: ", ar, ", updates: ", up, ", area/update: ", r, "\
        , \ndescription: ", desc, "\n") 

In [None]:
#Bottom-10

for i in range(1,11):
    proj = sorted_ratios[-i][0]
    r = sorted_ratios[-i][1]
        
    name = names[int(proj)]
    desc = descriptions[int(proj)]
   
    ar = area_per_proj[proj] 
    up = updates_per_proj[proj]
    
    print("#", i, ", project: ", name, ", area: ", ar, ", updates: ", up, ", area/update: ", r, "\
        , \ndescription: ", desc, "\n")

In [None]:
%matplotlib inline
def plot_updates_vs_area(X, Y, output_file_name):
    plt.clf()
    ax = plt.subplot()
    ax.scatter(X, Y, color="green", marker='x', s=5)
    ax.set_ylabel('area', fontsize=30)
    ax.tick_params(labelsize=23)
    ax.set_xlabel('#updates', fontsize=30)
    ax.set_xscale('log')
    ax.set_xlim(1,None)
    #ax.set_ylim(0,100)
    plt.savefig(output_file_name, dpi=300, bbox_inches='tight')

plot_updates_vs_area(X, Y, "../plots/plot_updates_vs_area.svg")

In [None]:
%matplotlib tk

cmap = plt.cm.RdYlGn
norm = plt.Normalize(1,4)

fig, ax = plt.subplots()
sc = ax.scatter(X, Y, color="green", marker='x', s=5)
ax.scatter(X, Y, color="green", marker='x', s=5)
ax.scatter(X, Y, color="green", marker='x', s=5)
ax.set_ylabel('area', fontsize=30)
ax.tick_params(labelsize=23)
ax.set_xlabel('#updates', fontsize=30)
ax.set_xscale('log')
ax.set_xlim(1,None)
#ax.set_ylim(0,100)
    
annot = ax.annotate("", xy=(0,0), xytext=(5,5),textcoords="offset points", size=14)
annot.set_visible(False)   

def update_annot(ind):
    pos = sc.get_offsets()[ind["ind"][0]]
    annot.xy = pos
    text = "{}".format("".join(IDs[ind["ind"]]))
    annot.set_text(text)
    
def hover(event):
    vis = annot.get_visible()
    if event.inaxes == ax:
        cont, ind = sc.contains(event)
        if cont:
            update_annot(ind)
            annot.set_visible(True)
            fig.canvas.draw_idle()
        else:
            if vis:
                annot.set_visible(False)
                fig.canvas.draw_idle()
    
fig.canvas.mpl_connect("motion_notify_event", hover)

<a id='Pixels_vs_Entropy'></a>
# Pixels vs Entropy

In [None]:
#Pixels vs. entropy per project
X,Y = Create_Array(pixels_per_proj, entropy_per_proj)

#Pixels vs. entropy: ratio and ID
ratios, IDs = Ratio(pixels_per_proj,entropy_per_proj, names)

In [None]:
sorted_ratios = sorted(ratios.items(), key=operator.itemgetter(1), reverse=True)

In [None]:
#Correlation

np.corrcoef(X, Y)

In [None]:
#Entropy/pixel
  
#Top-10

for i in range(10):
    proj = sorted_ratios[i][0]
    r = sorted_ratios[i][1]
        
    name = names[int(proj)]
    desc = descriptions[int(proj)]
   
    
    ent = entropy_per_proj[proj]
    pix = pixels_per_proj[proj]
    
    print("#", i, ", project: ", name, ", entropy: ", ent, ", pixels: ", pix, ", entropy/pixel: ", r, "\
        , \ndescription: ", desc, "\n")     

In [None]:
#Bottom-10
for i in range(1,11):
    proj = sorted_ratios[-i][0]
    r = sorted_ratios[-i][1]
    
    name = names[int(proj)]
    desc = descriptions[int(proj)]
   
    ent = entropy_per_proj[proj]
    pix = pixels_per_proj[proj]
    
    print("#", i, ", project: ", name, ", entropy: ", ent, ", pixels: ", pix, ", entropy/pixel: ", r, "\
        , \ndescription: ", desc, "\n") 

In [None]:
%matplotlib inline
def plot_pixels_vs_entropy(X, Y, output_file_name):
    plt.clf()

    ax = plt.subplot(111)
    ax.scatter(X, Y, color="green", marker='x', s=5)
    ax.set_ylabel('entropy', fontsize=30)
    ax.tick_params(labelsize=23)
    ax.set_xlabel('#pixels', fontsize=30)
    ax.set_yticks(np.arange(0.69,math.exp(1)+.1, .69))
    ax.set_xscale('log')
    ax.set_xlim(1,1000000)
    plt.savefig(output_file_name, dpi=300, bbox_inches='tight')

plot_pixels_vs_entropy(X, Y, "../plots/plot_pixels_vs_entropy.svg")

In [None]:
%matplotlib tk

cmap = plt.cm.RdYlGn
norm = plt.Normalize(1,4)

fig, ax = plt.subplots()
sc = ax.scatter(X, Y, color="green", marker='x', s=5)
ax.scatter(X, Y, color="green", marker='x', s=5)
ax.set_ylabel('entropy', fontsize=30)
ax.tick_params(labelsize=23)
ax.set_xlabel('#pixels', fontsize=30)
ax.set_yticks(np.arange(0.69,math.exp(1)+.1, .69))
ax.set_xscale('log')
ax.set_xlim(1,1000000)
    
annot = ax.annotate("", xy=(0,0), xytext=(5,5),textcoords="offset points", size=14)
annot.set_visible(False)   

def update_annot(ind):
    pos = sc.get_offsets()[ind["ind"][0]]
    annot.xy = pos
    text = "{}".format("".join(IDs[ind["ind"]]))
    annot.set_text(text)
    
def hover(event):
    vis = annot.get_visible()
    if event.inaxes == ax:
        cont, ind = sc.contains(event)
        if cont:
            update_annot(ind)
            annot.set_visible(True)
            fig.canvas.draw_idle()
        else:
            if vis:
                annot.set_visible(False)
                fig.canvas.draw_idle()
    
fig.canvas.mpl_connect("motion_notify_event", hover)

<a id='Pixels_vs_Users'></a>
# Pixels vs Users

In [None]:
#Pixels vs. users per project
X,Y = Create_Array(pixels_per_proj, users_per_proj)

#Pixels vs. users: ratio and ID
ratios, IDs = Ratio(pixels_per_proj,users_per_proj, names)

In [None]:
sorted_ratios = sorted(ratios.items(), key=operator.itemgetter(1), reverse=True)

In [None]:
#Correlation

np.corrcoef(X, Y)

In [None]:
#users/pixel
  
#Top-10

for i in range(10):
    proj = sorted_ratios[i][0]
    r = sorted_ratios[i][1]
        
    name = names[int(proj)]
    desc = descriptions[int(proj)]
   
    
    usr = users_per_proj[proj]
    pix = pixels_per_proj[proj]
    
    print("#", i, ", project: ", name, ", users: ", usr, ", pixels: ", pix, ", users/pixel: ", r, "\
        , \ndescription: ", desc, "\n") 

In [None]:
#Bottom-10
for i in range(1,11):
    proj = sorted_ratios[-i][0]
    r = sorted_ratios[-i][1]
    
    name = names[int(proj)]
    desc = descriptions[int(proj)]
   
    usr = users_per_proj[proj]
    pix = pixels_per_proj[proj]
    
    print("#", i, ", project: ", name, ", users: ", usr, ", pixels: ", pix, ", users/pixel: ", r, "\
        , \ndescription: ", desc, "\n")

In [None]:
%matplotlib inline
def plot_pixels_vs_users(X, Y, output_file_name):
    plt.clf()

    ax = plt.subplot(111)
    ax.scatter(X, Y, color="green", marker='x', s=5)
    ax.set_ylabel('#users', fontsize=30)
    ax.tick_params(labelsize=23)
    ax.set_xlabel('#pixels', fontsize=30)
    ax.set_xscale('log')
    ax.set_yscale('log')
    ax.set_ylim(1,1000000)
    ax.set_xlim(1,1000000)
    plt.savefig(output_file_name, dpi=300, bbox_inches='tight')

plot_pixels_vs_users(X, Y, "../plots/plot_pixels_vs_users.svg")

In [None]:
%matplotlib tk

cmap = plt.cm.RdYlGn
norm = plt.Normalize(1,4)

fig, ax = plt.subplots()
sc = ax.scatter(X, Y, color="green", marker='x', s=5)
ax.scatter(X, Y, color="green", marker='x', s=5)
ax.set_ylabel('#users', fontsize=30)
ax.tick_params(labelsize=23)
ax.set_xlabel('#pixels', fontsize=30)
ax.set_xscale('log')
ax.set_yscale('log')
ax.set_ylim(1,1000000)
ax.set_xlim(1,1000000)
    
annot = ax.annotate("", xy=(0,0), xytext=(5,5),textcoords="offset points", size=14)
annot.set_visible(False)   

def update_annot(ind):
    pos = sc.get_offsets()[ind["ind"][0]]
    annot.xy = pos
    text = "{}".format("".join(IDs[ind["ind"]]))
    annot.set_text(text)
    
def hover(event):
    vis = annot.get_visible()
    if event.inaxes == ax:
        cont, ind = sc.contains(event)
        if cont:
            update_annot(ind)
            annot.set_visible(True)
            fig.canvas.draw_idle()
        else:
            if vis:
                annot.set_visible(False)
                fig.canvas.draw_idle()
    
fig.canvas.mpl_connect("motion_notify_event", hover)

<a id='Pixels_vs_Time'></a>
# Pixels vs Time

In [None]:
#Pixels vs. time per project
X,Y = Create_Array(pixels_per_proj, times_per_proj)

#Pixels vs. area: ratio and ID
ratios, IDs = Ratio(pixels_per_proj, times_per_proj, names)

In [None]:
sorted_ratios = sorted(ratios.items(), key=operator.itemgetter(1), reverse=True)

In [None]:
#Correlation

np.corrcoef(X, Y)

In [None]:
 #Time/Pixel
#Top-10

for i in range(10):
    proj = sorted_ratios[i][0]
    r = sorted_ratios[i][1]/(1000 * 60 * 60)
        
    name = names[int(proj)]
    desc = descriptions[int(proj)]
   
    
    tm = times_per_proj[proj] / (1000 * 60 * 60)
    pix = pixels_per_proj[proj]
    
    print("#", i, ", project: ", name, ", time: ", tm, ", pixels: ", pix, ", time/pixel: ", r, "\
        , \ndescription: ", desc, "\n") 
    

In [None]:
#Bottom-10
for i in range(1,11):
    proj = sorted_ratios[-i][0]
    r = sorted_ratios[-i][1]/(1000 * 60 * 60)
    
    name = names[int(proj)]
    desc = descriptions[int(proj)]
   
    tm = times_per_proj[proj] / (1000 * 60 * 60)
    pix = pixels_per_proj[proj]
    
    print("#", i, ", project: ", name, ", time: ", tm, ", pixels: ", pix, ", time/pixel: ", r, "\
        , \ndescription: ", desc, "\n") 

In [None]:
%matplotlib inline
Y= Y/(3600*1000)
def plot_pixels_vs_times(X, Y, output_file_name):
    plt.clf()

    ax = plt.subplot(111)
    ax.scatter(X, Y, color="green", marker='x', s=5)
    ax.set_ylabel('time (hours)', fontsize=30)
    ax.tick_params(labelsize=23)
    ax.set_xlabel('#pixels', fontsize=30)
    ax.set_xscale('log')
    ax.set_ylim(0,100)
    ax.set_xlim(1,1000000)
    plt.savefig(output_file_name, dpi=300, bbox_inches='tight')

plot_pixels_vs_times(X, Y, "../plots/plot_pixels_vs_times.svg")

In [None]:
%matplotlib tk

cmap = plt.cm.RdYlGn
norm = plt.Normalize(1,4)

fig, ax = plt.subplots()
sc = ax.scatter(X, Y, color="green", marker='x', s=5)
ax.scatter(X, Y, color="green", marker='x', s=5)
ax.set_ylabel('time (hours)', fontsize=30)
ax.tick_params(labelsize=23)
ax.set_xlabel('#pixels', fontsize=30)
ax.set_xscale('log')
#ax.set_ylim(0,100)
#ax.set_xlim(1,1000000)
    
annot = ax.annotate("", xy=(0,0), xytext=(5,5),textcoords="offset points", size=14)
annot.set_visible(False)   

def update_annot(ind):
    pos = sc.get_offsets()[ind["ind"][0]]
    annot.xy = pos
    text = "{}".format("".join(IDs[ind["ind"]]))
    annot.set_text(text)
    
def hover(event):
    vis = annot.get_visible()
    if event.inaxes == ax:
        cont, ind = sc.contains(event)
        if cont:
            update_annot(ind)
            annot.set_visible(True)
            fig.canvas.draw_idle()
        else:
            if vis:
                annot.set_visible(False)
                fig.canvas.draw_idle()
    
fig.canvas.mpl_connect("motion_notify_event", hover)

<a id='Pixels_vs_Area'></a>
# Pixels vs Area

In [None]:
#Pixels vs. area per project
X,Y = Create_Array(pixels_per_proj, area_per_proj)

#Pixels vs. area: ratio and ID
ratios, IDs = Ratio(pixels_per_proj, area_per_proj, names)

In [None]:
sorted_ratios = sorted(ratios.items(), key=operator.itemgetter(1), reverse=True)

In [None]:
#Correlation

np.corrcoef(X, Y)

In [None]:
 #Area/Pixel
#Top-10

for i in range(10):
    proj = sorted_ratios[i][0]
    r = sorted_ratios[i][1]
        
    name = names[int(proj)]
    desc = descriptions[int(proj)]
   
    
    ar = area_per_proj[proj] 
    pix = pixels_per_proj[proj]
    
    print("#", i, ", project: ", name, ", area: ", ar, ", pixels: ", pix, ", area/pixel: ", r, "\
        , \ndescription: ", desc, "\n") 

In [None]:
 #bottom-10

for i in range(1,11):
    proj = sorted_ratios[-i][0]
    r = sorted_ratios[-i][1]
        
    name = names[int(proj)]
    desc = descriptions[int(proj)]
   
    
    ar = area_per_proj[proj] 
    pix = pixels_per_proj[proj]
    
    print("#", i, ", project: ", name, ", area: ", ar, ", pixels: ", pix, ", area/pixel: ", r, "\
        , \ndescription: ", desc, "\n") 

In [None]:
%matplotlib inline
def plot_pixels_vs_area(X, Y, output_file_name):
    plt.clf()

    ax = plt.subplot(111)
    ax.scatter(X, Y, color="green", marker='x', s=5)
    ax.set_ylabel('area', fontsize=30)
    ax.tick_params(labelsize=23)
    ax.set_xlabel('#pixels', fontsize=30)
    ax.set_xscale('log')
    #ax.set_ylim(0,100)
    ax.set_xlim(1,1000000)
    plt.savefig(output_file_name, dpi=300, bbox_inches='tight')

plot_pixels_vs_area(X, Y, "../plots/plot_pixels_vs_area.svg")

In [None]:
%matplotlib tk

cmap = plt.cm.RdYlGn
norm = plt.Normalize(1,4)

fig, ax = plt.subplots()
sc = ax.scatter(X, Y, color="green", marker='x', s=5)
ax.scatter(X, Y, color="green", marker='x', s=5)
ax.set_ylabel('area', fontsize=30)
ax.tick_params(labelsize=23)
ax.set_xlabel('#pixels', fontsize=30)
ax.set_xscale('log')
#ax.set_ylim(0,100)
ax.set_xlim(1,1000000)
    
annot = ax.annotate("", xy=(0,0), xytext=(5,5),textcoords="offset points", size=14)
annot.set_visible(False)   

def update_annot(ind):
    pos = sc.get_offsets()[ind["ind"][0]]
    annot.xy = pos
    text = "{}".format("".join(IDs[ind["ind"]]))
    annot.set_text(text)
    
def hover(event):
    vis = annot.get_visible()
    if event.inaxes == ax:
        cont, ind = sc.contains(event)
        if cont:
            update_annot(ind)
            annot.set_visible(True)
            fig.canvas.draw_idle()
        else:
            if vis:
                annot.set_visible(False)
                fig.canvas.draw_idle()
    
fig.canvas.mpl_connect("motion_notify_event", hover)

<a id='Users_vs_Entropy'></a>
# Users_vs_Entropy

In [None]:
#Users vs. entropy per project

X,Y = Create_Array(users_per_proj, entropy_per_proj)

#Users vs. entropy: ratio and ID
ratios, IDs = Ratio(users_per_proj, entropy_per_proj, names)

In [None]:
sorted_ratios = sorted(ratios.items(), key=operator.itemgetter(1), reverse=True)

In [None]:
#Correlation

np.corrcoef(X, Y)

In [None]:
 #Entropy/user
#Top-10

for i in range(10):
    proj = sorted_ratios[i][0]
    r = sorted_ratios[i][1]
        
    name = names[int(proj)]
    desc = descriptions[int(proj)]
   
    
    ent = entropy_per_proj[proj] 
    usr = users_per_proj[proj]
    
    print("#", i, ", project: ", name, ", entropy: ", ent, ", users: ", usr, ", entropy/user: ", r, "\
        , \ndescription: ", desc, "\n") 

In [None]:
 #Entropy/user
#Bottom-10

for i in range(1,11):
    proj = sorted_ratios[-i][0]
    r = sorted_ratios[-i][1]
        
    name = names[int(proj)]
    desc = descriptions[int(proj)]
   
    
    ent = entropy_per_proj[proj] 
    usr = users_per_proj[proj]
    
    print("#", i, ", project: ", name, ", entropy: ", ent, ", users: ", usr, ", entropy/user: ", r, "\
        , \ndescription: ", desc, "\n") 

In [None]:
%matplotlib inline
def plot_users_vs_entropy(X, Y, output_file_name):
    plt.clf()

    ax = plt.subplot(111)
    ax.scatter(X, Y, color="green", marker='x', s=5)
    ax.set_ylabel('entropy', fontsize=30)
    ax.tick_params(labelsize=23)
    ax.set_xlabel('#users', fontsize=30)
    ax.set_xscale('log')
    ax.set_yticks(np.arange(0.69, math.exp(1)+.1, .69))
    ax.set_xlim(1,1000000)
    plt.savefig(output_file_name, dpi=300, bbox_inches='tight')

plot_users_vs_entropy(X, Y, "../plots/plot_users_vs_entropy.svg")

In [None]:
%matplotlib tk

cmap = plt.cm.RdYlGn
norm = plt.Normalize(1,4)

fig, ax = plt.subplots()
sc = ax.scatter(X, Y, color="green", marker='x', s=5)
ax.scatter(X, Y, color="green", marker='x', s=5)
ax.set_ylabel('entropy', fontsize=30)
ax.tick_params(labelsize=23)
ax.set_xlabel('#users', fontsize=30)
ax.set_xscale('log')
ax.set_yticks(np.arange(0.69, math.exp(1)+.1, .69))
ax.set_xlim(1,1000000)
    
annot = ax.annotate("", xy=(0,0), xytext=(5,5),textcoords="offset points", size=14)
annot.set_visible(False)   

def update_annot(ind):
    pos = sc.get_offsets()[ind["ind"][0]]
    annot.xy = pos
    text = "{}".format("".join(IDs[ind["ind"]]))
    annot.set_text(text)
    
def hover(event):
    vis = annot.get_visible()
    if event.inaxes == ax:
        cont, ind = sc.contains(event)
        if cont:
            update_annot(ind)
            annot.set_visible(True)
            fig.canvas.draw_idle()
        else:
            if vis:
                annot.set_visible(False)
                fig.canvas.draw_idle()
    
fig.canvas.mpl_connect("motion_notify_event", hover)

<a id='Users_vs_Time'></a>
# Users vs Time

In [None]:
#Users vs. time per project
X,Y = Create_Array(users_per_proj, times_per_proj)

#Users vs. time: ratio and ID
ratios, IDs = Ratio(users_per_proj, times_per_proj, names)

In [None]:
sorted_ratios = sorted(ratios.items(), key=operator.itemgetter(1), reverse=True)

In [None]:
#Correlation

np.corrcoef(X, Y)

In [None]:
#Time/User
#Top -10
for i in range(10):
    proj = sorted_ratios[i][0]
    r = sorted_ratios[i][1]/ (1000 * 60 * 60)
    
    name = names[int(proj)]
    desc = descriptions[int(proj)]
    
    tm = times_per_proj[proj] / (1000 * 60 * 60)
    usr = users_per_proj[proj]
    
    print("#", i, ", project: ", name, ", time: ", tm, ", users: ", usr, ", time/user: ", r, "\
        , \ndescription: ", desc, "\n")    

In [None]:
#Time/User
#Bottom -10
for i in range(1,11):
    proj = sorted_ratios[-i][0]
    r = sorted_ratios[-i][1]/(1000 * 60 * 60)
    
    name = names[int(proj)]
    desc = descriptions[int(proj)]
    
    tm = times_per_proj[proj] / (1000 * 60 * 60)
    usr = users_per_proj[proj]
    
    print("#", i, ", project: ", name, ", time: ", tm, ", users: ", usr, ", time/user: ", r, "\
        , \ndescription: ", desc, "\n") 

In [None]:
%matplotlib inline
Y = Y/(1000 * 60 * 60)
def plot_users_vs_times(X, Y, output_file_name):
    plt.clf()

    ax = plt.subplot(111)
    ax.scatter(X, Y, color="green", marker='x', s=5)
    ax.set_ylabel('times (hours)', fontsize=30)
    ax.tick_params(labelsize=23)
    ax.set_xlabel('#users', fontsize=30)
    ax.set_xscale('log')
    ax.set_ylim(0,100)
    ax.set_xlim(1,1000000)
    plt.savefig(output_file_name, dpi=300, bbox_inches='tight')

plot_users_vs_times(X, Y, "../plots/plot_users_vs_times.svg")

In [None]:
%matplotlib tk

cmap = plt.cm.RdYlGn
norm = plt.Normalize(1,4)

fig, ax = plt.subplots()
sc = ax.scatter(X, Y, color="green", marker='x', s=5)
ax.scatter(X, Y, color="green", marker='x', s=5)
ax.set_ylabel('times (hours)', fontsize=30)
ax.tick_params(labelsize=23)
ax.set_xlabel('#users', fontsize=30)
ax.set_xscale('log')
ax.set_ylim(0,100)
ax.set_xlim(1,1000000)
    
annot = ax.annotate("", xy=(0,0), xytext=(5,5),textcoords="offset points", size=14)
annot.set_visible(False)   

def update_annot(ind):
    pos = sc.get_offsets()[ind["ind"][0]]
    annot.xy = pos
    text = "{}".format("".join(IDs[ind["ind"]]))
    annot.set_text(text)
    
def hover(event):
    vis = annot.get_visible()
    if event.inaxes == ax:
        cont, ind = sc.contains(event)
        if cont:
            update_annot(ind)
            annot.set_visible(True)
            fig.canvas.draw_idle()
        else:
            if vis:
                annot.set_visible(False)
                fig.canvas.draw_idle()
    
fig.canvas.mpl_connect("motion_notify_event", hover)

<a id='Users_vs_Area'></a>
# Users vs Area

In [None]:
#Users vs. area per project
X,Y = Create_Array(users_per_proj, area_per_proj)

#Users vs. area: ratio and ID
ratios, IDs = Ratio(users_per_proj, area_per_proj, names)

In [None]:
sorted_ratios = sorted(ratios.items(), key=operator.itemgetter(1), reverse=True)

In [None]:
#Correlation

np.corrcoef(X, Y)

In [None]:
#Area/User
#Top -10
for i in range(10):
    proj = sorted_ratios[i][0]
    r = sorted_ratios[i][1]
    
    name = names[int(proj)]
    desc = descriptions[int(proj)]
    
    ar = area_per_proj[proj] 
    usr = users_per_proj[proj]
    
    print("#", i, ", project: ", name, ", area: ", ar, ", users: ", usr, ", area/user: ", r, "\
        , \ndescription: ", desc, "\n")   

In [None]:
#Area/User
#Bottom-10
for i in range(1,11):
    proj = sorted_ratios[-i][0]
    r = sorted_ratios[-i][1]
    
    name = names[int(proj)]
    desc = descriptions[int(proj)]
    
    ar = area_per_proj[proj] 
    usr = users_per_proj[proj]
    
    print("#", i, ", project: ", name, ", area: ", ar, ", users: ", usr, ", area/user: ", r, "\
        , \ndescription: ", desc, "\n")  

In [None]:
%matplotlib inline
def plot_users_vs_area(X, Y, output_file_name):
    plt.clf()

    ax = plt.subplot(111)
    ax.scatter(X, Y, color="green", marker='x', s=5)
    ax.set_ylabel('area', fontsize=30)
    ax.tick_params(labelsize=23)
    ax.set_xlabel('#users', fontsize=30)
    ax.set_xscale('log')
    #ax.set_ylim(0,100)
    ax.set_xlim(1,1000000)
    plt.savefig(output_file_name, dpi=300, bbox_inches='tight')

plot_users_vs_area(X, Y, "../plots/plot_users_vs_area.svg")

In [None]:
%matplotlib tk

cmap = plt.cm.RdYlGn
norm = plt.Normalize(1,4)

fig, ax = plt.subplots()
sc = ax.scatter(X, Y, color="green", marker='x', s=5)
ax.scatter(X, Y, color="green", marker='x', s=5)
ax.set_ylabel('area', fontsize=30)
ax.tick_params(labelsize=23)
ax.set_xlabel('#users', fontsize=30)
ax.set_xscale('log')
#ax.set_ylim(0,100)
ax.set_xlim(1,1000000)
    
annot = ax.annotate("", xy=(0,0), xytext=(5,5),textcoords="offset points", size=14)
annot.set_visible(False)   

def update_annot(ind):
    pos = sc.get_offsets()[ind["ind"][0]]
    annot.xy = pos
    text = "{}".format("".join(IDs[ind["ind"]]))
    annot.set_text(text)
    
def hover(event):
    vis = annot.get_visible()
    if event.inaxes == ax:
        cont, ind = sc.contains(event)
        if cont:
            update_annot(ind)
            annot.set_visible(True)
            fig.canvas.draw_idle()
        else:
            if vis:
                annot.set_visible(False)
                fig.canvas.draw_idle()
    
fig.canvas.mpl_connect("motion_notify_event", hover)

<a id='Time_vs_Entropy'></a>
# Time vs Entropy

In [None]:
#Time vs. entropy per project

X,Y = Create_Array(times_per_proj, entropy_per_proj)

#Users vs. entropy: ratio and ID
ratios, IDs = Ratio(times_per_proj, entropy_per_proj, names)

In [None]:
sorted_ratios = sorted(ratios.items(), key=operator.itemgetter(1), reverse=True)

In [None]:
#Correlation

np.corrcoef(X, Y)

In [None]:
#Entropy/Time
#Top-10
for i in range(10):
    proj = sorted_ratios[i][0]
    r = sorted_ratios[i][1]/(1000 * 60 * 60)
    
    name = names[int(proj)]
    desc = descriptions[int(proj)]
    
    tm = times_per_proj[proj] / (1000 * 60 * 60)
    ent = entropy_per_proj[proj]
    
    print("#", i, ", project: ", name, ", entropy: ", ent, ", time: ", tm, ", entropy/time: ", r, "\
        , \ndescription: ", desc, "\n")    

In [None]:
#Bottom-10
for i in range(1,11):
    proj = sorted_ratios[-i][0]
    r = sorted_ratios[-i][1]/(1000 * 60 * 60)
    
    name = names[int(proj)]
    desc = descriptions[int(proj)]
    
    tm = times_per_proj[proj] / (1000 * 60 * 60)
    ent = entropy_per_proj[proj]
    
    print("#", i, ", project: ", name, ", entropy: ", ent, ", time: ", tm, ", entropy/time: ", r, "\
        , \ndescription: ", desc, "\n")    

In [None]:
%matplotlib inline
X=X/(1000 * 60 * 60)
def plot_entropy_vs_times(X, Y, output_file_name):
    plt.clf()

    ax = plt.subplot(111)
    ax.scatter(X, Y, color="green", marker='x', s=5)
    ax.set_xlabel('times (hours)', fontsize=30)
    ax.tick_params(labelsize=23)
    ax.set_ylabel('entropy', fontsize=30)
    ax.set_yticks(np.arange(0.69,math.exp(1)+.1, .69))
    ax.set_xlim(0,100)
    plt.savefig(output_file_name, dpi=300, bbox_inches='tight')

plot_entropy_vs_times(X, Y, "../plots/plot_entropy_vs_times.svg")

In [None]:
%matplotlib tk

cmap = plt.cm.RdYlGn
norm = plt.Normalize(1,4)

fig, ax = plt.subplots()
sc = ax.scatter(X, Y, color="green", marker='x', s=5)
ax.scatter(X, Y, color="green", marker='x', s=5)
ax.set_xlabel('times (hours)', fontsize=30)
ax.tick_params(labelsize=23)
ax.set_ylabel('entropy', fontsize=30)
ax.set_yticks(np.arange(0.69,math.exp(1)+.1, .69))
ax.set_xlim(0,100)
    
annot = ax.annotate("", xy=(0,0), xytext=(5,5),textcoords="offset points", size=14)
annot.set_visible(False)   

def update_annot(ind):
    pos = sc.get_offsets()[ind["ind"][0]]
    annot.xy = pos
    text = "{}".format("".join(IDs[ind["ind"]]))
    annot.set_text(text)
    
def hover(event):
    vis = annot.get_visible()
    if event.inaxes == ax:
        cont, ind = sc.contains(event)
        if cont:
            update_annot(ind)
            annot.set_visible(True)
            fig.canvas.draw_idle()
        else:
            if vis:
                annot.set_visible(False)
                fig.canvas.draw_idle()
    
fig.canvas.mpl_connect("motion_notify_event", hover)

<a id='Time_vs_Area'></a>
# Time vs Area

In [None]:
#Time vs. area per project

X,Y = Create_Array(times_per_proj, area_per_proj)

#Time vs. area: ratio and ID
ratios, IDs = Ratio(times_per_proj, area_per_proj, names)

In [None]:
sorted_ratios = sorted(ratios.items(), key=operator.itemgetter(1), reverse=True)

In [None]:
#Correlation

np.corrcoef(X, Y)

In [None]:
#Area/Time
#Top-10
for i in range(10):
    proj = sorted_ratios[i][0]
    r = sorted_ratios[i][1]*(1000*3600)
    
    name = names[int(proj)]
    desc = descriptions[int(proj)]
    
    ar = area_per_proj[proj] 
    t = times_per_proj[proj]/(1000*3600)
    
    print("#", i, ", project: ", name, ", time: ", t, ", area: ", ar, ", area/time: ", r, "\
        , \ndescription: ", desc, "\n")  

In [None]:
#Area/Time
#Bottom-10
for i in range(1,11):
    proj = sorted_ratios[-i][0]
    r = sorted_ratios[-i][1]*(1000*3600)
    
    name = names[int(proj)]
    desc = descriptions[int(proj)]
    
    ar = area_per_proj[proj] 
    t = times_per_proj[proj]/(1000*3600)
    
    print("#", i, ", project: ", name, ", time: ", t, ", area: ", ar, ", area/time: ", r, "\
        , \ndescription: ", desc, "\n") 

In [None]:
%matplotlib inline
X=X/(1000*3600)
def plot_area_vs_times(X, Y, output_file_name):
    plt.clf()

    ax = plt.subplot(111)
    ax.scatter(X, Y, color="green", marker='x', s=5)
    ax.set_xlabel('times (hours)', fontsize=30)
    ax.tick_params(labelsize=23)
    ax.set_ylabel('area', fontsize=30)
    #ax.set_yticks(np.arange(0.69,math.exp(1)+.1, .69))
    ax.set_xlim(0,100)
    plt.savefig(output_file_name, dpi=300, bbox_inches='tight')

plot_area_vs_times(X, Y, "../plots/plot_area_vs_times.svg")

In [None]:
%matplotlib tk

cmap = plt.cm.RdYlGn
norm = plt.Normalize(1,4)

fig, ax = plt.subplots()
sc = ax.scatter(X, Y, color="green", marker='x', s=5)
ax.scatter(X, Y, color="green", marker='x', s=5)
ax.set_xlabel('times (hours)', fontsize=30)
ax.tick_params(labelsize=23)
ax.set_ylabel('area', fontsize=30)
#ax.set_yticks(np.arange(0.69,math.exp(1)+.1, .69))
ax.set_xlim(0,100)
    
annot = ax.annotate("", xy=(0,0), xytext=(5,5),textcoords="offset points", size=14)
annot.set_visible(False)   

def update_annot(ind):
    pos = sc.get_offsets()[ind["ind"][0]]
    annot.xy = pos
    text = "{}".format("".join(IDs[ind["ind"]]))
    annot.set_text(text)
    
def hover(event):
    vis = annot.get_visible()
    if event.inaxes == ax:
        cont, ind = sc.contains(event)
        if cont:
            update_annot(ind)
            annot.set_visible(True)
            fig.canvas.draw_idle()
        else:
            if vis:
                annot.set_visible(False)
                fig.canvas.draw_idle()
    
fig.canvas.mpl_connect("motion_notify_event", hover)

<a id='Area_vs_Entropy'></a>
# Area vs Entropy

In [None]:
#Area vs. entropy per project

X,Y = Create_Array(area_per_proj, entropy_per_proj)

#Area vs. entropy: ratio and ID
ratios, IDs = Ratio(area_per_proj, entropy_per_proj, names)

In [None]:
sorted_ratios = sorted(ratios.items(), key=operator.itemgetter(1), reverse=True)

In [None]:
#Correlation

np.corrcoef(X, Y)

In [None]:
#Entropy/Area
#Top-10
for i in range(10):
    proj = sorted_ratios[i][0]
    r = sorted_ratios[i][1]
    
    name = names[int(proj)]
    desc = descriptions[int(proj)]
    
    ar = area_per_proj[proj]
    ent = entropy_per_proj[proj]
    
    print("#", i, ", project: ", name, ", entropy: ", ent, ", area: ", ar, ", entropy/area: ", r, "\
        , \ndescription: ", desc, "\n")

In [None]:
#Entropy/Area
#Bottom-10
for i in range(1,11):
    proj = sorted_ratios[-i][0]
    r = sorted_ratios[-i][1]
    
    name = names[int(proj)]
    desc = descriptions[int(proj)]
    
    ar = area_per_proj[proj]
    ent = entropy_per_proj[proj]
    
    print("#", i, ", project: ", name, ", entropy: ", ent, ", area: ", ar, ", entropy/area: ", r, "\
        , \ndescription: ", desc, "\n")

In [None]:
%matplotlib inline
def plot_entropy_vs_area(X, Y, output_file_name):
    plt.clf()

    ax = plt.subplot(111)
    ax.scatter(X, Y, color="green", marker='x', s=5)
    ax.set_xlabel('area', fontsize=30)
    ax.tick_params(labelsize=23)
    ax.set_ylabel('entropy', fontsize=30)
    ax.set_yticks(np.arange(0.69,math.exp(1)+.1, .69))
    #ax.set_xlim(0,100)
    plt.savefig(output_file_name, dpi=300, bbox_inches='tight')

plot_entropy_vs_area(X, Y, "../plots/plot_entropy_vs_area.svg")

In [None]:
%matplotlib tk

cmap = plt.cm.RdYlGn
norm = plt.Normalize(1,4)

fig, ax = plt.subplots()
sc = ax.scatter(X, Y, color="green", marker='x', s=5)
ax.scatter(X, Y, color="green", marker='x', s=5)
ax.set_xlabel('area', fontsize=30)
ax.tick_params(labelsize=23)
ax.set_ylabel('entropy', fontsize=30)
ax.set_yticks(np.arange(0.69,math.exp(1)+.1, .69))
#ax.set_xlim(0,100)
    
annot = ax.annotate("", xy=(0,0), xytext=(5,5),textcoords="offset points", size=14)
annot.set_visible(False)   

def update_annot(ind):
    pos = sc.get_offsets()[ind["ind"][0]]
    annot.xy = pos
    text = "{}".format("".join(IDs[ind["ind"]]))
    annot.set_text(text)
    
def hover(event):
    vis = annot.get_visible()
    if event.inaxes == ax:
        cont, ind = sc.contains(event)
        if cont:
            update_annot(ind)
            annot.set_visible(True)
            fig.canvas.draw_idle()
        else:
            if vis:
                annot.set_visible(False)
                fig.canvas.draw_idle()
    
fig.canvas.mpl_connect("motion_notify_event", hover)

<a id='Updates_per_time'></a>
# Updates per Time

In [None]:
#---already computed (first block)
#tile_updates, total_tile_updates = update_category_per_project()
#use tile_updates and total_tile_updates

In [None]:
def generate_figures_time(hour_marks, picID):
    begin_time = 1490918688000
    
    if picID is None:
            data = extract_canvas_color('../data/sorted_tile_placements.csv', 0, 1000, 0, 1000,\
                            begin_time+1000*60*60*hour_marks)
    else:
            data = extract_project_color('../data/sorted_tile_placements_proj.csv', picID, 
                            begin_time+1000*60*60*hour_marks)
    
    for t in range(len(hour_marks)):
        tmp_file_name = "../plots/tmp_fig_"+str(t)+".png"
        draw_canvas(canvas_color_code_rgb(data[t]), tmp_file_name);

In [None]:
from canvas_vis import *
from matplotlib.offsetbox import (TextArea, DrawingArea, OffsetImage,
                                  AnnotationBbox)
from matplotlib.cbook import get_sample_data
import time
from mpl_toolkits.axes_grid1 import make_axes_locatable
from mpl_toolkits.axes_grid1.axes_divider import make_axes_area_auto_adjustable

%matplotlib inline
def plot_updates_per_time_with_figure(pixel, pixel_color, other, proj, hour_marks, up_line, zoom, output_file_name):
    plt.clf()
    ax = plt.subplot(111)
    total = pixel+pixel_color+other
    ax.plot(np.arange(pixel.shape[0]), 100*pixel, color="green", linewidth=5, label="Final", linestyle='-')
    ax.plot(np.arange(pixel_color.shape[0]), 100*pixel_color, color="blue", linewidth=4, label="Match", linestyle='--')
    ax.plot(np.arange(other.shape[0]), 100*other, color="red", linewidth=3, label="Adv", linestyle=':')
       
    for t in range(len(hour_marks)):
        tmp_file_name = "../plots/tmp_fig_"+str(t)+".png"
        arr_img = plt.imread(tmp_file_name, format='png')
        
        imagebox = OffsetImage(arr_img, zoom=zoom)
        imagebox.image.axes = ax
    
        ab = AnnotationBbox(imagebox, (hour_marks[t], up_line),
                        None,
                        xycoords='data',
                        pad=0.1)

        ax.add_artist(ab)
    
    ax.set_ylabel('update (%)', fontsize=20)
    ax.set_xlabel('time (hours)', fontsize=20)
    ax.tick_params(labelsize=18)
    ax.set_xlim(0, 110)

    ax.set_xticks(np.arange(0,95, 24))
    ax.set_ylim(0, 5.2)
    ax.set_yticks(np.arange(0,1.3, .4))
    
    ax.vlines(np.array(hour_marks),ymin=0,ymax=up_line,color='k', linestyle='--')
    ax.legend(loc='upper center', bbox_to_anchor=(0.5, 1.2),
          fancybox=True, shadow=True, ncol=3, fontsize=15)
    ax.ticklabel_format(useOffset=False, style='plain')
    #ax.set_aspect(aspect=.2)
    
    plt.savefig(output_file_name, dpi=300, bbox_inches='tight')

In [None]:
generate_figures_time(np.array([24, 48, 72, 96]), None)

In [None]:
#zoom =.04 decides how large the figures should be

plot_updates_per_time_with_figure(total_tile_updates["final_updates"], total_tile_updates["agreeing_updates"],
    total_tile_updates["disagreeing_updates"], None, np.array([24, 48, 72, 96]), 1.7, .04,
    "../plots/plot_updates_time_total_fig.svg")

In [None]:
%matplotlib inline
def plot_updates_per_time_all_types(pixel, pixel_color, other, output_file_name):
    plt.clf()
    ax = plt.subplot(111)
    total = pixel+pixel_color+other
    ax.plot(np.arange(pixel.shape[0]), 100*pixel, color="green", linewidth=5, label="Final", linestyle='-')
    ax.plot(np.arange(pixel_color.shape[0]), 100*pixel_color, color="blue", linewidth=4, label="Match", linestyle='--')
    ax.plot(np.arange(other.shape[0]), 100*other, color="red", linewidth=3, label="Adv", linestyle=':')
        
    ax.set_ylabel('update (%)', fontsize=30)
    ax.set_xlabel('time (hours)', fontsize=30)
    ax.tick_params(labelsize=23)
    ax.set_xticks(np.arange(0,97, 24))
    ax.legend(loc='upper center', bbox_to_anchor=(0.5, 1.2),
          fancybox=True, shadow=True, ncol=3, fontsize=15)
    ax.ticklabel_format(useOffset=False, style='plain')
    plt.savefig(output_file_name, dpi=300, bbox_inches='tight')

In [None]:
#total updates per time all types
plot_updates_per_time_all_types(total_tile_updates["final_updates"], total_tile_updates["agreeing_updates"],
    total_tile_updates["disagreeing_updates"], "../plots/plot_updates_time_total.svg")

In [None]:
%matplotlib tk

cmap = plt.cm.RdYlGn
norm = plt.Normalize(1,4)

pixel = total_tile_updates["final_updates"]

pixel_color = total_tile_updates["agreeing_updates"]

other = total_tile_updates["disagreeing_updates"]

fig,ax = plt.subplots()
total = pixel+pixel_color+other
line1, = ax.plot(np.arange(pixel.shape[0]), 100*pixel, color="green", linewidth=5, label="Final", linestyle='-')
line2, = ax.plot(np.arange(pixel_color.shape[0]), 100*pixel_color, color="blue", linewidth=4, label="Match", linestyle='--')
line3, = ax.plot(np.arange(other.shape[0]), 100*other, color="red", linewidth=3, label="Adv", linestyle=':')
        
ax.set_ylabel('update (%)', fontsize=30)
ax.set_xlabel('time (hours)', fontsize=30)
ax.tick_params(labelsize=23)
ax.set_xticks(np.arange(0,97, 24))
ax.legend(loc='upper center', bbox_to_anchor=(0.5, 1.2),
          fancybox=True, shadow=True, ncol=3, fontsize=15)
ax.ticklabel_format(useOffset=False, style='plain')
    
annot = ax.annotate("", xy=(0,0), xytext=(5,5),textcoords="offset points", size=14)
annot.set_visible(False)   

def update_annot(ind, line):
    x,y = line.get_data()
    annot.xy = (x[ind["ind"][0]], y[ind["ind"][0]])
    text = "{}, {}".format(str(annot.xy[0]), 
                           str(np.around(annot.xy[1], decimals=2)))
    annot.set_text(text)

def hover(event):
    vis = annot.get_visible()
    if event.inaxes == ax:
        cont, ind = line1.contains(event)
        if cont:
            update_annot(ind, line1)
            annot.set_visible(True)
            fig.canvas.draw_idle()
        else:
            cont, ind = line2.contains(event)
            
            if cont:
                update_annot(ind, line2)
                annot.set_visible(True)
                fig.canvas.draw_idle()
            else:
                cont, ind = line3.contains(event)
            
                if cont:
                    update_annot(ind, line3)
                    annot.set_visible(True)
                    fig.canvas.draw_idle()
                else:
                    if vis:
                        annot.set_visible(False)
                    fig.canvas.draw_idle()

fig.canvas.mpl_connect("motion_notify_event", hover)

## Example of some projects

In [None]:
generate_figures_time(np.array([24, 48, 72, 96]), 286)

In [None]:
picID = 286

plot_updates_per_time_with_figure(tile_updates[picID]["final_updates"], tile_updates[picID]["agreeing_updates"],
    tile_updates[picID]["disagreeing_updates"], picID, np.array([24, 48, 72, 96]), 1.8, 0.045, 
    "../plots/plot_updates_time_286_fig.svg")

In [None]:
generate_figures_time(np.array([24, 48, 72, 96]), 1824)

In [None]:
from canvas_vis import *
from matplotlib.offsetbox import (TextArea, DrawingArea, OffsetImage,
                                  AnnotationBbox)
from matplotlib.cbook import get_sample_data
import time
from mpl_toolkits.axes_grid1 import make_axes_locatable
from mpl_toolkits.axes_grid1.axes_divider import make_axes_area_auto_adjustable

%matplotlib inline
def plot_updates_per_time_with_figure(pixel, pixel_color, other, proj, hour_marks, up_line, zoom, output_file_name):
    plt.clf()
    ax = plt.subplot(111)
    total = pixel+pixel_color+other
    ax.plot(np.arange(pixel.shape[0]), 100*pixel, color="green", linewidth=5, label="Final", linestyle='-')
    ax.plot(np.arange(pixel_color.shape[0]), 100*pixel_color, color="blue", linewidth=4, label="Match", linestyle='--')
    ax.plot(np.arange(other.shape[0]), 100*other, color="red", linewidth=3, label="Adv", linestyle=':')
       
    for t in range(len(hour_marks)):
        tmp_file_name = "../plots/tmp_fig_"+str(t)+".png"
        arr_img = plt.imread(tmp_file_name, format='png')
        
        imagebox = OffsetImage(arr_img, zoom=zoom)
        imagebox.image.axes = ax
    
        ab = AnnotationBbox(imagebox, (hour_marks[t], up_line),
                        None,
                        xycoords='data',
                        pad=0.1)

        ax.add_artist(ab)
    
    ax.set_ylabel('update (%)', fontsize=20)
    ax.set_xlabel('time (hours)', fontsize=20)
    ax.tick_params(labelsize=18)
    ax.set_xlim(0, 110)

    ax.set_xticks(np.arange(0,95, 24))
    ax.set_ylim(0, 28)
    #ax.set_yticks(np.arange(0,1.3, .4))
    
    ax.vlines(np.array(hour_marks),ymin=0,ymax=up_line,color='k', linestyle='--')
    ax.legend(loc='upper center', bbox_to_anchor=(0.5, 1.2),
          fancybox=True, shadow=True, ncol=3, fontsize=15)
    ax.ticklabel_format(useOffset=False, style='plain')
    #ax.set_aspect(aspect=.2)
    
    plt.savefig(output_file_name, dpi=300, bbox_inches='tight')

picID = 1824

plot_updates_per_time_with_figure(tile_updates[picID]["final_updates"], tile_updates[picID]["agreeing_updates"],
    tile_updates[picID]["disagreeing_updates"], picID, np.array([24, 48, 72, 96]), 22., 0.045, 
    "../plots/plot_updates_time_1824_fig.svg")

In [None]:
picID = 179
plot_updates_per_time_all_types(tile_updates[picID]["final_updates"], tile_updates[picID]["agreeing_updates"],
    tile_updates[picID]["disagreeing_updates"], "../plots/plot_updates_time_179.svg")

In [None]:
picID = 1493
plot_updates_per_time_all_types(tile_updates[picID]["final_updates"], tile_updates[picID]["agreeing_updates"],
    tile_updates[picID]["disagreeing_updates"], "../plots/plot_updates_time_1493.svg")

In [None]:
picID = 2
plot_updates_per_time_all_types(tile_updates[picID]["final_updates"], tile_updates[picID]["agreeing_updates"],
    tile_updates[picID]["disagreeing_updates"], "../plots/plot_updates_time_2.svg")

<a id='#Distance_users'></a>
# Distance (Users')

In [None]:
#Two types of distances: euclidean and cosine
# sample size denotes the number of considered users per project to compute the distance
sample_size=500
euc_dis,cos_dis,rand_dis = distance_per_project_all(input_file_proj,projects_to_remove, sample_size)

In [None]:
plt.hist(euc_dis.values(), bins=20)

In [None]:
plt.hist(rand_dis[0], bins=20)

In [None]:
plt.hist(cos_dis.values(), bins=20)

In [None]:
plt.hist(rand_dis[1], bins=20)

In [None]:
#Test: the users' involvemnet in a project
#project_per_user_lst=projects_per_user_list(input_file_proj, projects_to_remove)
#users_per_proj_lst=users_per_project_list(input_file_proj, projects_to_remove)