# Data Analyse

If you want to do your own analyse of the data on `db.sqlite3` and are going to use Python you can take advantage of some Django code. This Jupyter Notebook will help you to enable the Django code.

## Setup and run

To setup your environment to run this Jupyter notebook you need to install some packages. Our suggestion is to run

~~~
$ python -m pip install -r requirements.txt
$ python -m pip install -r requirements-jupyter.txt
~~~

from your terminal.

To start Jupyter server, run

~~~
$ python manage.py shell_plus --notebook
~~~

## Basic (Django Part)

You can use all power of Django on the notebook. For example, to gain access to the models you can use

In [1]:
import lowfat.models as models

To select all the fellows you can use

In [2]:
fellows = models.Claimant.objects.filter(fellow=True)
fellows

<QuerySet [<Claimant: Black Widow (2017 ✓)>, <Claimant: The Hulk (2017 ✓)>, <Claimant: Green Arrow (2016 ✓)>, <Claimant: Iron Man (2016 ✓)>, <Claimant: Captain America (2015 ✓)>]>

Remember that the `Claimant` table can have entries that aren't fellows and because of it we need to use `.filter(selected=True)`.

## Basic (Pandas Part)

You can use Pandas with Django.

In [3]:
import pandas as pd

fellows = pd.DataFrame(list(fellows.values()))
fellows

Unnamed: 0,added,affiliation,application_year,attended_collaborations_workshop,attended_inaugural_meeting,bitbucket,career_stage_when_apply,carpentries_instructor,claimantship_grant,collaborator,...,screencast_url,slug,surname,terms_and_conditions_id,twitter,updated,user_id,website,website_feed,work_description
0,2016-07-07 14:59:46.412,College,2016,False,False,,3,False,3000.0,False,...,,black-widow,Widow,2017,BlackWidow,2018-02-06 15:48:04.747,3,http://black-widow.fake/,http://black-widow.fake/feed/,Work.
1,2016-07-07 14:59:46.412,University,2016,False,False,,3,False,3000.0,False,...,,the-hulk,Hulk,2017,TheHulk,2018-02-06 15:26:59.858,2,http://the-hulk.fake/,http://the-hulk.fake/feed/,Work
2,2016-07-07 14:59:46.412,University,2015,False,False,,3,False,3000.0,False,...,,green-arrow,Arrow,2016,GreenArrow,2018-02-06 15:27:13.848,4,http://green-arrow.fake/,http://green-arrow.fake/feed/,Work
3,2016-07-07 14:59:46.412,University,2015,False,False,,3,False,3000.0,False,...,,iron-man,Man,2016,IronMan,2018-02-06 15:27:25.708,5,http://iron-man.fake/,http://iron-man.fake/feed/,Tech
4,2016-07-07 14:59:46.412,College,2014,False,False,,3,False,3000.0,False,...,,captain-america,America,2015,CaptainAmerica,2018-02-06 15:27:39.366,6,http://captain-america.fake/,http://captain-america.fake/feed/,Work


When converting a Django `QuerySet` into a Pandas `DataFrame` you will need to as the previous example because so far Pandas can't process Django `QuerySet`s by default.

### Pandas table as CSV and as Data URIs

For the report, we need to Pandas table as CSV encoded inside data URIs so users can download the CSV file without querying the server.

In [4]:
from base64 import b64encode

csv = fellows.to_csv(
    header=True,
    index=False
)

b64encode(csv.encode())

b'YWRkZWQsYWZmaWxpYXRpb24sYXBwbGljYXRpb25feWVhcixhdHRlbmRlZF9jb2xsYWJvcmF0aW9uc193b3Jrc2hvcCxhdHRlbmRlZF9pbmF1Z3VyYWxfbWVldGluZyxiaXRidWNrZXQsY2FyZWVyX3N0YWdlX3doZW5fYXBwbHksY2FycGVudHJpZXNfaW5zdHJ1Y3RvcixjbGFpbWFudHNoaXBfZ3JhbnQsY29sbGFib3JhdG9yLGRlcGFydG1lbnQsZW1haWwsZXhhbXBsZV9vZl93cml0aW5nX3VybCxmYWNlYm9vayxmZWxsb3csZm9yZW5hbWVzLGZ1bmRpbmcsZnVuZGluZ19ub3RlcyxnZW5kZXIsZ2l0aHViLGdpdGxhYixnb29nbGVfc2Nob2xhcixncm91cCxob21lX2NpdHksaG9tZV9jb3VudHJ5LGhvbWVfbGF0LGhvbWVfbG9uLGlkLGluYXVndXJhdGlvbl9ncmFudF9leHBpcmF0aW9uLGluc3RpdHV0aW9uYWxfd2Vic2l0ZSxpbnRlcmVzdHMsaXNfaW50b190cmFpbmluZyxqb2JfdGl0bGVfd2hlbl9hcHBseSxsaW5rZWRpbixtZW50b3JfaWQsbm90ZXNfZnJvbV9hZG1pbixvcmNpZCxwaG9uZSxwaG90byxwaG90b193b3JrX2Rlc2NyaXB0aW9uLHJlY2VpdmVkX29mZmVyLHJlc2VhcmNoX2FyZWEscmVzZWFyY2hfYXJlYV9jb2RlLHJlc2VhcmNoX3NvZnR3YXJlX2VuZ2luZWVyLHNjcmVlbmNhc3RfdXJsLHNsdWcsc3VybmFtZSx0ZXJtc19hbmRfY29uZGl0aW9uc19pZCx0d2l0dGVyLHVwZGF0ZWQsdXNlcl9pZCx3ZWJzaXRlLHdlYnNpdGVfZmVlZCx3b3JrX2Rlc2NyaXB0aW9uCjIwMTYtMDctMDcgMTQ6NTk6NDYuNDEyLE

The output of `b64encode` can be included in

```
<a download="fellows.csv" href="data:application/octet-stream;charset=utf-16le;base64,{{ b64encode_output | safe }}">Download the data as CSV.</a>
```

so that user can download the data.

## Basic (Tagulous)

We use [Tagulous](http://radiac.net/projects/django-tagulous/) as a tag library.

In [5]:
funds = models.Fund.objects.all()
pd.DataFrame(list(funds.values()))

Unnamed: 0,ad_status,added,additional_info,approved,budget_approved,budget_request_attendance_fees,budget_request_catering,budget_request_others,budget_request_subsistence_cost,budget_request_travel,...,lat,lon,mandatory,notes_from_admin,required_blog_posts,start_date,status,title,updated,url
0,V,2016-07-07 14:59:46.412,,NaT,1500.0,0.0,500.0,0.0,0.0,1000.0,...,30.0518,-65.84834,False,,1,2016-09-25,P,9d6816aa - Black Widow,2016-09-05 10:53:16.027000,http://9d6816aa.com
1,V,2016-07-07 14:59:46.412,,2018-05-23 09:42:48.686967,500.0,0.0,0.0,0.0,0.0,500.0,...,30.0518,-65.84834,False,,1,2016-08-01,A,9d6816de - Black Widow,2018-05-23 09:42:48.687343,http://9d6816de.com
2,V,2016-07-07 14:59:46.412,,NaT,2000.0,0.0,0.0,0.0,0.0,2000.0,...,-13.13435,-90.89369,False,,1,2016-06-16,F,9d681148 - Captain America,2018-05-23 09:42:58.142582,http://9d681148.com
3,V,2016-07-07 14:59:46.412,,NaT,2500.0,0.0,0.0,0.0,0.0,2500.0,...,3.91522,102.7173,False,,1,2016-05-14,F,9d68144a - Green Arrow,2018-04-25 15:42:47.952000,http://9d68144a.com
4,H,2016-07-13 15:01:25.316,,NaT,0.0,0.0,0.0,0.0,0.0,1500.0,...,15.53126,48.89437,False,,1,2015-07-01,R,9d681d8c - Green Arrow,2016-09-05 10:51:42.337000,http://9d681d8c.com
5,V,2016-07-07 14:59:46.412,,2018-04-25 15:43:12.812000,2000.0,0.0,0.0,0.0,1000.0,2000.0,...,-20.63223,-178.32413,False,,1,2015-05-14,A,9d683330 - The Hulk,2018-04-25 15:43:12.814000,http://9d683330.com
6,H,2016-07-13 14:59:44.036,,NaT,0.0,0.0,1000.0,0.0,0.0,0.0,...,-73.47064,-9.87283,False,,1,2014-11-10,F,9d681b66 - Captain America,2018-04-25 15:43:45.278000,http://9d681b66.com
7,V,2016-07-07 14:59:46.412,,NaT,1000.0,0.0,0.0,0.0,0.0,1000.0,...,12.25341,173.44064,False,,1,2014-05-20,F,9d680716 - Iron Man,2018-05-23 09:43:21.692780,http://9d680716.com
8,V,2016-07-07 14:59:46.412,,NaT,1500.0,0.0,0.0,0.0,0.0,1500.0,...,167.75629,-71.04731,False,,1,2014-05-16,F,9d680248 - Iron Man,2018-04-25 15:46:51.513000,http://9d680248.com
9,V,2016-07-13 14:58:08.260,,NaT,500.0,0.0,0.0,0.0,0.0,500.0,...,46.95474,-6.73372,False,,1,2014-02-20,R,9d6818e6 - Captain America,2016-07-17 14:09:23.715000,http://9d6818e6.com


Get a list of all tags:

In [6]:
funds[0].activity.all()

<TagTreeModelQuerySet []>

You can loop over each tag:

In [7]:
for tag in funds[0].activity.all():
    print(tag.name)

Filter for a specific tag:

In [8]:
models.Fund.objects.filter(activity="ssi2/fellowship")

<CastTaggedQuerySet []>

You can query for part of the name of the tag:

In [9]:
models.Fund.objects.filter(activity__name__contains="fellowship")

<CastTaggedQuerySet []>

In [10]:
for fund in models.Fund.objects.filter(activity__name__contains="fellowship"):
    print("{} - {}".format(fund, fund.activity.all()))