# Python for Datascience

- Henry Webel at [NNF CPR](https://www.cpr.ku.dk/staff/rasmussen-group/?pure=en/persons/662319)  [![Twitter](https://img.shields.io/twitter/url/https/twitter.com/cloudposse.svg?style=social&label=Follow%20%40Henrywebel)](https://twitter.com/henrywebel)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pythontsunami/teaching/blob/intro/introduction.ipynb)

### Saving the notebook in Drive
Save a copy in your drive if you want to save your changes: `File` -> `Save a copy in Drive`


![Save Colab Notebook in Google Drive](figures/colab_save_in_drive.png)

or 

![Save Colab Notebook in Google Drive](figures/colab_save_in_drive_2.png)


**Table of Contents in Colab**
> Allows easier navigation

![Table of content in Colab](figures/colab_toc.png)

## What is Python?

- a programming language?
- a community of developers?
- PyData? Jupyter? SciPy? NumPy?
- A highlevel language to use many tools

### An attempt for a great and beautiful programming language

It's based on the idea to be readable, consistent and easy to learn. [Gido van Rossum](https://twitter.com/gvanrossum)

> "programming languages are how programmers express and communicate ideas â€” and the audience for those ideas is other programmers, not computers." ([king's day speach, 2016](http://neopythonic.blogspot.com/2016/04/kings-day-speech.html))

In [None]:
import this

In [None]:
love = this
this is love

In [None]:
love is True

In [None]:
love is False

In [None]:
love is not True or False

In [None]:
love is love

### Python Data model

- everything is an object
- syntax is calling so called `magic` - methods (`__truediv__` here)

In [None]:
12 / 4

In [None]:
a = 12
a.__truediv__(4)

In [None]:
from pathlib import Path
Path('path/to/') / "file"

In [None]:
a = Path('path/to/')
a.__truediv__("file")

In [None]:
a = [1, 2, 6, 9]
len(a)

In [None]:
# ?

### Python clients

Clients are a way to call other programs having a similar (and easier) API

- pyspark
- tensorflow
- pytorch

> Speaking python allows you to use many other tools by provided librariers or clients.

## Python for Datascience

- data manipulation
    - [pandas](https://pandas.pydata.org/pandas-docs/stable/user_guide/10min.html)
    - [numpy](https://numpy.org/doc/stable/user/quickstart.html)
- plotting
    - [matplotlib](https://matplotlib.org/)
    - [seaborn](https://seaborn.pydata.org/introduction.html)
    - [pandas](https://pandas.pydata.org/pandas-docs/stable/user_guide/visualization.html)
- machine learning (science of learning from data)
    - [scikit-learn](https://scikit-learn.org/stable/)
    - [tensorflow](https://www.tensorflow.org/)
    - [pytorch](https://pytorch.org/)

### Pandas example

#### European Centre for Disease Prevention and Control (ECDC)

- [European Centre for Disease Prevention and Control (ECDC) - Testing data](ecdc.europa.eu/en/publications-data/covid-19-testing) and [ECDC daily numbers worldwide](https://www.ecdc.europa.eu/en/publications-data/download-todays-data-geographic-distribution-covid-19-cases-worldwide)

In [None]:
import pandas as pd
pd?

In [None]:
url_ecdc_daily_cases = "https://opendata.ecdc.europa.eu/covid19/casedistribution/csv/data.csv"
url_ecdc_weekly_testing = "https://opendata.ecdc.europa.eu/covid19/testing/csv"

ecdc_daily_cases = pd.read_csv(url_ecdc_daily_cases, parse_dates=True, infer_datetime_format=True)
ecdc_weekly_testing = pd.read_csv(url_ecdc_weekly_testing)

In [None]:
ecdc_daily_cases.head()

In [None]:
ecdc_daily_cases.dateRep = pd.to_datetime(ecdc_daily_cases.dateRep)
ecdc_daily_cases.set_index('dateRep').head() # not persistent

In [None]:
ecdc_weekly_testing.tail()

In [None]:
# ecdc_weekly_france = ecdc_weekly_testing.query('country == United Kingdom")
mask = ecdc_weekly_testing.country == "France"
ecdc_weekly_france = ecdc_weekly_testing.loc[mask]

Check out the documentation of [`pandas.DataFrame.plot`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.plot.html)

In [None]:
# will be method of a DataFrame instance
# pd.DataFrame.plot?

In [None]:
ecdc_weekly_france.set_index("year_week").plot(y='positivity_rate', rot=90)

In [None]:
slice?

In [None]:
df = ecdc_weekly_testing.set_index(["year_week", "country"])
df.loc[(slice(None),"France"),:]

In [None]:
import matplotlib.pyplot as plt
# fig, ax = plt.subplots(figsize=(15,15))
ax = None
_ = df["positivity_rate"].unstack().plot(ax=ax, y=None, rot=90, xlabel='week')

## Webprogramming

### Native websites
- [Django](https://www.djangoproject.com/), used for [large websites](https://hackernoon.com/10-popular-websites-built-with-django-906cc310aa0a)

### Dashboards

> mostly run locally, but can in principle be deployed

- [dash](https://plotly.com/dash/)
- [voila](https://github.com/voila-dashboards/voila)
- [panel](https://panel.holoviz.org/)
 - [ipywidgets](https://ipywidgets.readthedocs.io/en/latest/)
- [streamlit](https://www.streamlit.io/)

## Integrated Development Environments

IDEs are manifold
- [VSCode](https://code.visualstudio.com/)
- [spyder](https://www.spyder-ide.org/), try [here](https://mybinder.org/v2/gh/spyder-ide/spyder/4.x?urlpath=/desktop)
- [jupyter lab](https://jupyter.org/)

## Jupyter

- python vs ipython
- jupyter lab and notebook

[magic functions](https://ipython.readthedocs.io/en/stable/interactive/magics.html), e.g. the [`%time`](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-time) magic

In [None]:
%%time
a = list(range(100_000_000)) # IPython is used in Notebooks

In [None]:
a[11314]

Again, don't do this, but use numpy if you should really have need for it.

In [None]:
%%time
import numpy as np
a = np.arange(100_000_000)
a[11314]

There are many utility functions shipped with IPython

In [None]:
from IPython.core.debugger import set_trace
set_trace?

In [None]:
# Don't use this. There are whole libraries to do that.
def remove_suffix(name,suffix=' '):
    _list = name.split(suffix)
#     set_trace()
    if len(_list) > 1:
        return "".join(_list[:-1])
    else:
        return _list[0]

name = "my/long/name.ext"
remove_suffix(name, suffix='.ext')

In [None]:
import os
os.path.splitext(name)

In [None]:
from pathlib import Path
Path(name).with_suffix('')

How do you remember this? Do you? 
- [link](https://stackoverflow.com/a/3548689/9684872) to stackoverflow

## Resources

[learnpython.org](https://www.learnpython.org/)

### For non-programmers

- python.org [references](https://wiki.python.org/moin/BeginnersGuide/NonProgrammers)
- Datacamp, Udemey, Coursera, codecademy, udacity, edx

### For programmers (R, Java, C)
 - [official tutorial](https://docs.python.org/3/tutorial/)
 - [Fluent Python, 2021](https://www.oreilly.com/library/view/fluent-python-2nd/9781492056348/)
 - [introductions](https://wiki.python.org/moin/IntroductoryBooks)
 - [freecodecamp - intermediate](https://www.youtube.com/watch?v=HGOBQPFzWKo)
 - [Hitchhiker's Guide to Python](https://docs.python-guide.org/)