# Look at the big picture

Welcome to the Invertebrate Institute! A research advisor you work for has come to you with an interesting problem. Many higher order organisms tend to display bilateral asymmetry in their brains: there are differences between the left and right sides of the brain. "Insects," the researcher argues, "are *far* too simple to display such a complicated process as bilateral asymmetry!" You being a statistician ask him exactly how he came to this conclusion. "Well, I took a look at a network of the left and right sides of the fly brain, and they looked similar enough to me!" Now while this type of analysis might be suitable for some people, it isn't quite enough for you. You want to know exactly how similar they are. You want to know whether there are any nuanced differences between the halves of the brain that maybe were imperceptible to his naked eye. You want to know what the halves of the brain being similar even means! You are an expert at machine learning, so surely you can use your background to help answer this question.

If these are the types of questions you have when you see new network data, then this is the right book for you. 

## Framing the problem

The first question to ask your friend is; what exactly is the objective here? How will science (or a company) use and benefit from the knowledge we hope to gain? In network machine learning, the choice of the model used is *everything*. The model determines what sorts of questions we are capable of asking, and what sorts of *answers* we are capable of learning. Asking about the objectives will directly shape which models and approaches you use.

Your colleague replies that he will give you two networks. The networks here will be the left and right brain networks from *drosophila*, the fruit fly, where the nodes will be individual neurons in the fruit fly. The edge weights in the network represent how strongly one neuron is able to communicate with the second neuron. Further, you also have another piece of information: you know that each neuron is one of four cell types (Kenyon cells, Projection neurons, Optic Neurons, and Input neurons), each of which are responsible for a unique function in the fly's brain. Your colleague wants to know whether the connections between the different cell types are different in the left and right brain networks.

The next question to ask is what the current solution looks like. This will help you to understand where to start approaching the problem, and give you a reference for the performance of your techniques. Your colleague answers that presently everything is done by manual inspection alone, and that no investigations thus far have directly studied asymmetry in the brain of an insect. This is incredibly crude and non-technical, and has no performance metrics of note. So you've got a totally novel problem to approach!

Next, you need to determine what type of network machine learning problem you have. What type of data do you have? Do you have any covariates associated with that data? What type of question do you want to answer? Do you want to test a hypothesis, or make predictions? What characteristics will your model need to reflect to be able to answer the question appropriately? Before you progress further, you should try to think and answer these questions for yourself. 

From your colleague's response, you've learned quite a bit about the problem at hand. You know you have two networks and a set of covariates for each, the cell types of each node across the networks. Further, you know you want to answer a yes or no type of question: "Are the networks similar, or are they different?" This means you will need to design a hypothesis test. Finally, you need some way of being precise about the meaning of the word "different", which your colleague clarified for you: do the connections between different cell types differ between the left and right brain networks. This means that we want statistical models which take a network with node labels, in which we are capable of differentiating the connections between different cell types. 

## Check the assumptions

Throughout the course of this book, we will try to keep in direct focus the assumptions being made by the techniques we might pick. You want to choose the simplest set of assumptions that can reasonably reflect the data. This means that you want to use the simplest statistical model that can answer the question you want to address. In this case, we don't care about individual neuron-to-neuron connections at all: we only care about how groups of neurons behave in relation to other groups of neurons. This means that we want to choose models which will allow us to learn about pairs of neuron groups, which is a very different problem from learning about individual neurons themselves. You don't want to find out after developing an analysis which produces results on pairs of neuron groups that your boss actually wanted you to compare individual neurons themselves!

After talking over your understanding of the problem with your research colleague, you are confident that he wants a hypothesis test that is capable of determining whether pairs of neuron groups differ. You now have the green light to get coding!