## The ecossystem

To choose the right set of tools to solve your problem might be one of the most challenging choices you'll face, simply because of the diversity of tools available in the ecosystem and how specific each case is.
Learn about the ecossysstem you're working with.

There are many libraries that leverage the speed of C, C++ or Fortran. Rather than using Python standard library or creating your own optmizations is worth to take a look on what's available on your ecossystem.

### Interpreters:

**Numba**: Numba is an open source JIT compiler that translates a subset of Python and NumPy code into fast machine code. Also offers [recommendations](https://numba.pydata.org/numba-doc/latest/user/performance-tips.html) to improve performance.

```python
@njit( parallel=True)
def simulator(out):
    # iterate loop in parallel
    for i in prange(out.shape[0]):
        out[i] = run_sim()
```

**Pypy**: An alternative JIT compiler.

- Short-running processes: if it doesn't run for at least a few seconds, then the JIT compiler won't have enough time to warm up.

- If all the time is spent in run-time libraries (i.e. in C functions), and not actually running Python code, the JIT compiler will not help.

PyPy works best is when executing long-running programs where a significant fraction of the time is spent executing Python code. This is the case covered by the majority of our benchmarks, but not all of them --- the goal of PyPy is to get speed but still support any Python program.

### Tools:

**Astropy**: Has some [recommendations](https://docs.astropy.org/en/stable/units/index.html#astropy-units-performance) on how to write performant code.

**Numpy**: Set of tools for array manipulation, several kinds of mathematical functions, random number generators, linear algebra routines, Fourier transforms,

**Pandas**: Offers data structures and operations for manipulating numerical tables and time series. Check [Pandas](https://pandas.pydata.org/docs/user_guide/enhancingperf.html)' recommendation on performance.

**Dask**: Most NumPy and SciPy functions are implemented in C or C++, and can leverage all CPU cores because they release Pythonâ€™s GIL (Global Interpreter Lock). The Dask project supports parallelizing NumPy, Pandas, and scikit-learn processing across clusters of machines. If you're using Dask, I can't recommend enough their [best practices](https://docs.dask.org/en/stable/best-practices.html) session.

```python
import pandas as pd                     import dask.dataframe as dd
df = pd.read_csv('2015-01-01.csv')      df = dd.read_csv('2015-*-*.csv')
df.groupby(df.user_id).value.mean()     df.groupby(df.user_id).value.mean().compute()

import numpy as np                       import dask.array as da
f = h5py.File('myfile.hdf5')             f = h5py.File('myfile.hdf5')
x = np.array(f['/small-data'])           x = da.from_array(f['/big-data'],
                                                           chunks=(1000, 1000))
x - x.mean(axis=1)                       x - x.mean(axis=1).compute()
```

### Languages:

**Cython**: The Cython language is a superset of the Python language that additionally supports calling C functions and declaring C types on variables and class attributes. This allows the compiler to generate very efficient C code from Cython code. The C code is generated once and then compiles with all major C/C++ compilers.

|Method           |Time (ms)   |Compared to Python|Compared to Numpy|
|-----------------|------------|------------------|---------------------|
|Pure Python      |183         |x1                |x0.03|
|Numpy            |5.97        |x31               |x1|
|Naive Cython     |7.76        |x24               |x0.8|
|Optimised Cython |2.18        |x84               |x2.7|
|Cython calling C |2.22        |x82               |x2.7|
