# Python Notebook to Create Venn Diagrams

This notebook is designed to highlight the concepts of Unit 8 of the Introduction to Biology and Programming module - specifically, showing how you can complete the same task using two different languages. 

By working through this notebook, you will see how to have Python load `csv` data into a dataframe and then plot a Venn Diagram of it. You can then compare this with doing the same task in `R` to highlight the differences and similarities between the two languages.

## Installing the Required Modules

The first thing we need to do is install the modules we will be using. These modules are:

 * `pandas` - This provides a lot of Data Analysis functions and objects. For this notebook we will be using the `DataFrame` objects that are essentially the same as the `R` data frames you've already met
 * `venn` - This is an extension to the `matplotlib` library that will actually plot the Venn Diagrams. Most plotting tasks in Python can be carried out using the `matplotlib` module alone, but not Venn diagrams unfortunately.
 
To install the two required modules, we'll use the shell-based `pip` command with the exclamation mark (`!`) to indicate it should be run through the shell rather than in Python:

In [None]:
!pip3 install --user pandas
!pip3 install --user venn

To check this install has worked, try importing the two modules:

In [None]:
import pandas
import venn

If you get an error from these `import` statements, please try restarting the kernel (from the menu, `Kernel -> Restart Kernel`) as this will make sure it is checking for modules in your user area.

## Reading in the Data

We now have the modules installed and working, we can load in the data. One of the helpful functions `pandas` provides is the `read_csv` function. This will load in the `csv` data we want and create a DataFrame object from it:

In [None]:
gene_list = pandas.read_csv('venn_gene_list.csv')

To check the load worked correctly, we'll print the new variable:

In [None]:
print(gene_list)

As you can see, we now have the DataFrame loaded and we can move on to plotting the Venn diagram!

## Plotting the Venn Diagrams

The `venn(...)` function that we're going to use to plot the Venn Diagrams needs the data supplied to it in a particular format. It must be a dictionary of `set`s. Sets are *unordered*, *unindexed* and *immutable* and can *never be duplicate entries*. These objects are actually an implementation of the Mathematical concept of a Set, including all the appropriate logic that goes with them.

The technical aspects of this aren't important in this context though. All you need to know is they are another type of collection and you can create a `set` by just calling the `set` function. For this case, we will need to create a `set` for each of the columns in the dataset and then create a dictionary that gives each of these sets a label. For a basic 2 set diagram, the following creates the dictionary that's needed:

In [None]:
venn_dict = {'Var A': set(gene_list['A']), 'Var B': set(gene_list['B'])}

We can now plot the Venn diagram from this data by simply calling the `venn(..)` function with this dictionary, though we do need to tell `matplotlib` to print too :

In [None]:
%matplotlib inline
venn.venn(venn_dict)


To create Venn diagrams with three sets, we just change the dictionary:

In [None]:
venn_dict = {'Var A': set(gene_list['A']), 
             'Var B': set(gene_list['B']), 
             'Var C': set(gene_list['C'])}

venn.venn(venn_dict)

And finally, with four sets:

In [None]:
venn_dict = {'Var A': set(gene_list['A']), 
             'Var B': set(gene_list['B']), 
             'Var C': set(gene_list['C']),
             'Var D': set(gene_list['D'])}

venn.venn(venn_dict)

## Customising Venn Diagrams

You can also customise your Venn diagrams by changing the transparency (or `alpha`) of the image, the format of the labels on each petal and the colours you want to use. For example:


In [None]:
venn.venn(venn_dict, alpha = 0.2, fmt="{percentage:.1f}%", 
          cmap = ["xkcd:violet", "xkcd:aquamarine", "xkcd:goldenrod", "xkcd:azure"])

More information about the modules used in this notebook can be found here:

* [pandas](https://pandas.pydata.org/)
* [venn](https://github.com/LankyCyril/pyvenn)
* [matplotlib](https://matplotlib.org/)
