# Final Project
The goal of the final project is to practice what we've learned in the class, on a real network dataset. You'll need to combine the intuition and framework of what we've learned during lectures with the toolset we've covered during computer labs. Plus, you'll be asked to creatively think about your dataset outside the scope of the tools you have. 

Your final project will be done in groups of 2-3 people (no exceptions for larger or smaller groups). Your presentation and writeup will be graded on the inclusion of the items in the following description. 

## Final Project Guidelines
For your final project, your team will be asked to complete the following tasks
* Decide what network you'd like to study. There are options and ideas below.
* Learn about the network dataset you've been assigned. Describe it for us. 
    * what are the nodes?
        * How many nodes are there?
    * what are the edges?
        * How many edges are there?
    * Is it weighted or unweighted?
        * If weighted, what do the weights represent/mean?
    * directed or undirected?
    * is it connected or disconnected? if disconnected, is there a giant component?
    * Is there anything interesting or weird about it?
    * What's the context of this network?
    * Do you have access to any metadata about the nodes?
* Create an adjacency matrix for your matrix using your `.csv` file
* Plot your network
* Find the most central node(s) in your network 
    * Do so for each of the four centrality metrics we've discussed in class. 
        * Plot your network with a different colour for each "most central" node
    * Do the centrality metrics identify different node(s) as the "most central"? 
    * Given the context of your dataset, interpret which centrality metric _you_ think is most informative. Explain.
* Find and plot the communities in your network
    * We've only discussed one graph partitioning algorithm to do this: Girvan-Newman. 
    * Do the identified communities make sense given the context of your data?
* Report on and interpret one more attribute of the data. Examples are:
    * Average (in/out-)degree
    * Average clustering coefficient
    * Degree distribution 
    * Average shortest path
    * [Anything documented here](https://networkx.org/documentation/stable/reference/algorithms/index.html)
        * NOTE: If you report on an attribute of the data that we have _not_ covered in class, you are expected to _understand it_ well enough to describe it to the class. You are welcome to come to my office hours to discuss this, or to make an appointment with me. Most of the `networkx` documentation includes a description of the method/metric/algorithm
* Conclude
    * Do you think about this dataset differently than you did before? 
    * Did you learn anything about the discipline/domain the network lies in?
    * What's something you wish you could learn about the network that you might not have a tool for?
    * Is there any metadata about the nodes/edges that you wish you could have to interpret your findings better?

In [1]:
import networkx as nx

# Final Report
Your final report should address each of the points above and should include the code you wrote and the plots you got for each section. You should write this final report in a Jupyter Notebook, which you then download as a pdf compiled with LaTeX. If you don't have LaTeX on your machine, you can also do File -> Download As -> LaTeX. Then open the downloaded LaTeX file with a basic text editor, and copy and paste it into [Overleaf](https://www.overleaf.com/). 

You should submit your compiled report to one team member's GitHub by 1PM on Thursday, February 2. Tag `@izabel-aguiar` and your team members' GitHub account tags in the description of the commit. 

# Final Presentation
Your team will be asked to give a 10 minute presentation of your findings to the class on our last day, February 2. Bring one team member's computer for sharing the presentation. If you don't feel you'll have time to present all your findings, you may pick a few that you find most interesting. 

# Network options
The following are all examples of networks you can study for your final project. When you submit your project proposal, I will reply with a `.csv` or `.npy` file of the network data you've chosen to study. Please read the abstract of the corresponding paper, or find relevant information about your network prior to choosing it to study. 

* [The Diffusion of Microfinance Village(s)](https://economics.mit.edu/sites/default/files/2022-08/science.1236498.pdf)
    * Any of twelve different types of relationships between people in a village in southern India
* [Gossip Village data](http://stanford.edu/~arungc/BCDJ_gossip.pdf)
    * Any of seven different types of relationships between people in a village in southern India
* [London Underground Transportation Network](https://tfl.gov.uk)
    * Nodes are stations and weighted edges connect stations if there is an underground route between them
* [Social network of physicians in Illinois](https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&ved=2ahUKEwiPvaedh9X8AhUalYkEHVk5A-4QFnoECAsQAQ&url=https%3A%2F%2Fwww.suz.uzh.ch%2Fdam%2Fjcr%3Affffffff-f952-f950-ffff-ffffdde58273%2F03.18_coleman_et-al_57.pdf&usg=AOvVaw0sBH1zgvLf6Y_yCbwgTBhH)
    * Any of three interactions between physicians (advice, collaboration, or friendship)
* [Caenorhabditis elegans connectome ](https://doi-org.stanford.idm.oclc.org/10.1073/pnas.0506806103)
    * Neuron synaptic interactions of the [roundworm](https://g.co/kgs/mfH4qs)
* [Various genetic/protein interactions](https://thebiogrid.org)
* [Microbiome interactions](https://www.nature.com/articles/nature13178)
    * in any of 18 human body sites
* [Food and Agriculture Organization of the UN: Food imports/exports between countries](http://www.fao.org/)
    * Look at imports/exports of any of 300+ food groups between countries
* Collect your own social network data from friends/sports team 
    * I have a python script that can help you with this!
    * or you can do this easily with an excel/google sheet
    * There must be at least 20 people in your network.
* [Les Mis characters](http://www-personal.umich.edu/~mejn/netdata/)
    * Coappearance network of characters in the novel Les Miserables.
* [_David Copperfield_ words](http://www-personal.umich.edu/~mejn/netdata/)
    * adjacency network of common adjectives and nouns in the novel David Copperfield by Charles Dickens
* [Dolphins social network](http://www-personal.umich.edu/~mejn/netdata/)
    * frequent associations between 62 dolphins in a community living off Doubtful Sound, New Zealand
* [Any network here](https://snap.stanford.edu/data/#socnets)
    * e.g.,
        * who-trusts-whom network of people who trade using Bitcoin
        * circles' (or 'friends lists') from Facebook
        * Hyperlinks between subreddits on Reddit
* [Any network here](https://manliodedomenico.com/data.php)
    * These are all "multilayer networks": networks wherein multiple _types_ of relationships are defined over the same nodeset. You'll need to pick one layer of these graphs to study-- we can talk about it.
* [Interactions between Yeast Proteins](http://vlado.fmf.uni-lj.si/pub/networks/data/bio/Yeast/Yeast.htm)
* [Jazz Musicians Collaborations](https://deim.urv.cat/~alexandre.arenas/data/welcome.htm)
    * [paper](https://arxiv.org/abs/cond-mat/0307434)