## Problem
An directed graph is connected if there is a path connecting any two nodes. A tree is a connected (undirected) graph containing no cycles; this definition forces the tree to have a branching structure organized arounda central core of nodes, just like its living counterpart.

We have already grown familiar with trees in "Mendel's first law", where we introudced the probability tree diagram to visualize the outcomes of a random variable.

In the creation of pylogeny, taxa are encoded by the tree's leaves, or nodes having degree 1. A node of a tree having degree larger than 1 is called an internal node.

**Given**: A positive integer n(n<= 1000) and a adjacency list corresponding to a graph on n nodes that contains no cycles.
**Return** : The minimum number of edges that can be added to the graph to produce a tree


In [3]:
import networkx as nx
import toolz as tz
from toolz.curried import *

In [45]:
input_data = {
    "sample": {
        "n": 10,
        "adj_list": """1 2
2 8
4 10
5 9
6 10
7 9"""
    }
}

with open("data/rosalind_tree.txt", "r") as f:
    input_data["test"] = {
        "n": int(f.readline().strip()),
        "adj_list": f.read().strip()
    }

cur_state = "test"
cur_data = input_data[cur_state]

input_processor = compose(list, map(lambda x: [int(num) for num in x.split(" ")]),flip(str.split, "\n"))

# label propagation
def make_g(adj_list):
    g = {item: [] for item  in unique(concat(adj_list))}
    for n_1, n_2 in adj_list:
        g[n_1].append(n_2)
        g[n_2].append(n_1)
    return g


@tz.curry
def get_required_edge_num(n, g):
    edges_n = 0
    for node, neighbors in g.items():
        edges_n += len(neighbors)

    return (n - 1) - int((edges_n / 2))

run = compose(get_required_edge_num(cur_data["n"]), make_g, input_processor)

run(cur_data["adj_list"])

90