# Introduction to parallel programming using Python

For years, only a clock that measured in HZ was taken into account to determine the number of instructions that a processor could process within a given time interval.

The longer the clock cycle, the more instructions will be executed. The cycles are measured in terms of KHz (thousands of operations per second), MHz (millions of operations per second) and the current GHz (billions of operations per second). **But the clock cycle can no longer be increased (Physically), so more cores must be created within the same processor.**





![SegmentLocal](images/cpu.gif "segment")

In [3]:
import warnings
warnings.filterwarnings('ignore')
import sys

## 1. Serial Computing

Source: https://computing.llnl.gov/tutorials/parallel_comp/

Traditionally, software has been written for serial computation:

- A problem is broken into a discrete series of instructions
- Instructions are executed sequentially one after another
- Executed on a single processor
- Only one instruction may execute at any moment in time

<img src="images/serialProblem.gif" width="600" align="middle">



# 2. Parallel Computing
Source: https://computing.llnl.gov/tutorials/parallel_comp/

In the simplest sense, parallel computing is the simultaneous use of multiple compute resources to solve a computational problem:

- A problem is broken into discrete parts that can be solved concurrently
- Each part is further broken down to a series of instructions
- Instructions from each part execute simultaneously on different processors
- An overall control/coordination mechanism is employed

<img src="images/parallelProblem.gif" width="600" align="middle">



# 3. Parallel Programming in Python

<img src="images/notparallel.png" width="600" align="middle">

### 3.1 Global interpreter Lock

The mechanism that prevents the implementation in C of Python (which we will always refer to as CPython from now on) the execution of bytecode by several threads at once is called Global Interpreter Lock or GIL for short and has been and is a source of discussion and debate in the lists of mail of the developers of Python for a long time.

### 3.2 Multiprocessing

**Pros**

- Separate memory space
- Code is usually straightforward
- Takes advantage of multiple CPUs & cores
- Avoids GIL limitations for cPython
- Eliminates most needs for synchronization primitives unless if you use shared memory (instead, it's more of a communication model for IPC)
- Child processes are interruptible/killable
- Python multiprocessing module includes useful abstractions with an interface much like threading.Thread
- A must with cPython for CPU-bound processing

**Cons**

- IPC a little more complicated with more overhead (communication model vs. shared memory/objects)
- Larger memory footprint

Source: https://stackoverflow.com/questions/3044580/multiprocessing-vs-threading-python

In [4]:
import multiprocessing
import numpy as np
multiprocessing.cpu_count()

8

You have 4 physical cores, but 8 logical processors. Because in this case I have a processor with hyper-threading.

Serial function:

In [2]:
%%time
#Sequential code
from time import sleep

resultado = []

for i in range(4):
    sleep(1)
    resultado.append(i+1)

CPU times: user 760 µs, sys: 1.18 ms, total: 1.94 ms
Wall time: 4.02 s


Parallel function:

In [3]:
%%time

from concurrent.futures import ProcessPoolExecutor
e = ProcessPoolExecutor()

def incrementar(x):
    sleep(1)
    return x + 1

resultado = list(e.map(incrementar,range(4)))

CPU times: user 19.2 ms, sys: 27.8 ms, total: 47 ms
Wall time: 1.05 s


### Exercise: Create a parallel function that creates 20 CSV files with processes
----

Serial function that solves the problem:

In [4]:
%%time
np.random.seed(123)
x = np.random.poisson(20,(1000,1000))
for i_ in range(0, 20):
    np.savetxt('data_serial/x%06d.csv' % i_, x, delimiter=',', fmt='%d')

CPU times: user 2.96 s, sys: 77.1 ms, total: 3.04 s
Wall time: 3.06 s


**Create the parallel function that solves the problem:**

In [5]:
#you code here .....

### Referencias
---

https://github.com/rsnemmen/parallel-python-tutorial/blob/master/Parallel%20Computing%20with%20Python%20public.ipynb

https://github.com/rsnemmen/parallel-python-tutorial

https://github.com/dask/dask-tutorial

https://docs.python.org/3/library/concurrent.futures.html

### 3.3 Threading


**Pros**

- Lightweight - low memory footprint
- Shared memory - makes access to state from another context easier
- Allows you to easily make responsive UIs
- cPython C extension modules that properly release the GIL will run in parallel
- Great option for I/O-bound applications

**Cons**

- cPython - subject to the GIL
- Not interruptible/killable
- If not following a command queue/message pump model (using the Queue module), then manual use of synchronization primitives become a necessity (decisions are needed for the granularity of locking)
- Code is usually harder to understand and to get right - the potential for race conditions increases dramatically

Source: https://stackoverflow.com/questions/3044580/multiprocessing-vs-threading-python

In [2]:
'pyvo' in sys.modules

False

If not True, then install PyVO:
    
!pip install pyvo

In [9]:
import pyvo as vo
service = vo.dal.TAPService("http://www.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/tap")

In [None]:
service.available

In [16]:
import pyvo as vo
import concurrent.futures
import urllib.request
from concurrent.futures import ThreadPoolExecutor

In [17]:
URLS = ['http://reg.g-vo.org/tap',
        'http://dc.g-vo.org/tap',
        'http://dc.zah.uni-heidelberg.de/tap',
       'https://vo.chivo.cl/tap']

In [18]:
%%time
for url in URLS:
    service = vo.dal.TAPService(url)
    print("Url = "+url+" Disponible = "+str(service.available)+"Timeup ="+str(service.up_since))

Url = http://reg.g-vo.org/tap Disponible = TrueTimeup =2019-03-20T13:12:21Z
Url = http://dc.g-vo.org/tap Disponible = TrueTimeup =2019-03-20T13:12:21Z
Url = http://dc.zah.uni-heidelberg.de/tap Disponible = TrueTimeup =2019-03-20T13:12:21Z
Url = https://vo.chivo.cl/tap Disponible = TrueTimeup =2018-12-23T16:53:15Z
CPU times: user 29.4 ms, sys: 4.92 ms, total: 34.3 ms
Wall time: 1.71 s


In [19]:
def vo_tap(url):
    service = vo.dal.TAPService(url)
    print("Url = "+url+" Disponible = "+str(service.available)+"Timeup ="+str(service.up_since))
    return(service.available)

In [20]:
%%time
executor = ThreadPoolExecutor(max_workers=5)

task1 = executor.submit(vo_tap("https://vo.chivo.cl/tap"))
task2 = executor.submit(vo_tap("http://reg.g-vo.org/tap"))
task3 = executor.submit(vo_tap("http://dc.g-vo.org/tap"))
task4 = executor.submit(vo_tap("http://dc.zah.uni-heidelberg.de/tap"))

Url = https://vo.chivo.cl/tap Disponible = TrueTimeup =2018-12-23T16:53:15Z
Url = http://reg.g-vo.org/tap Disponible = TrueTimeup =2019-03-20T13:12:21Z
Url = http://dc.g-vo.org/tap Disponible = TrueTimeup =2019-03-20T13:12:21Z
Url = http://dc.zah.uni-heidelberg.de/tap Disponible = TrueTimeup =2019-03-20T13:12:21Z
CPU times: user 27 ms, sys: 5.07 ms, total: 32.1 ms
Wall time: 1.73 s


In [21]:
%%time
future_to_url = {executor.submit(vo_tap, url): url for url in URLS}
for future in concurrent.futures.as_completed(future_to_url):
        url = future_to_url[future]

Url = https://vo.chivo.cl/tap Disponible = TrueTimeup =2018-12-23T16:53:15Z
Url = http://dc.g-vo.org/tap Disponible = TrueTimeup =2019-03-20T13:12:21Z
Url = http://dc.zah.uni-heidelberg.de/tap Disponible = TrueTimeup =2019-03-20T13:12:21Z
Url = http://reg.g-vo.org/tap Disponible = TrueTimeup =2019-03-20T13:12:21Z
CPU times: user 28 ms, sys: 6.04 ms, total: 34 ms
Wall time: 573 ms


### Referencias
---

https://github.com/rsnemmen/parallel-python-tutorial/blob/master/Parallel%20Computing%20with%20Python%20public.ipynb

https://github.com/rsnemmen/parallel-python-tutorial

https://github.com/dask/dask-tutorial

https://docs.python.org/3/library/concurrent.futures.html

# ipyaladin

A bridge between Jupyter and Aladin Lite, enabling interactive sky visualization in IPython notebooks.

Source : https://github.com/cds-astro/ipyaladin

In [11]:
'ipyaladin' in sys.modules
'astroquery' in sys.modules

True

If not True, then install ipyaladin:
    
!pip install ipyaladin

If not True, then install astroquery:
    
!pip install --pre astroquery

Then, make sure to enable widgetsnbextension and ipyaladin:

In [22]:
!jupyter nbextension enable --py widgetsnbextension

Enabling notebook extension jupyter-js-widgets/extension...
      - Validating: [32mOK[0m


In [23]:
!jupyter nbextension enable --py --sys-prefix ipyaladin

Enabling notebook extension ipyaladin/extension...
      - Validating: [32mOK[0m


In [5]:
import ipyaladin as ipyal
from astroquery.simbad import Simbad
import astropy.units as u

In [6]:
Simbad.SIMBAD_URL = 'http://simbad.harvard.edu/simbad/sim-script'

In [9]:
aladin1 = ipyal.Aladin(fov= 0.45, target= 'NGC4782')#m82 NGC4782 m1
aladin1

Aladin(fov=0.45, options=['allow_full_zoomout', 'coo_frame', 'fov', 'full_screen', 'log', 'overlay_survey', 'o…

In [10]:
table1 = Simbad.query_region("NGC4782", radius=0.04 * u.deg)
type(table1)

NoneType