# How to get involved in NumPy

### Matti Picus
### Data Umbrella webcast Dec 2, 2020

https://github.com/mattip/archive/blob/data_umbrella/content/data_umbrella_dec_2_2020/data_umbrella.ipynb


# What will we do in the next hour?
- A brief history of NumPy: when, what, who?
- What drives NumPy: what are the goals?
- Communication channels and github repos
- Building and testing NumPy
- Take a look at some issues and PRs
- Q&A

# But first:

Matti Picus, 
- Quansight developer. Previously employed by Berkeley Institute of Data Science to work full time on NumPy. 
- Avid Open Source evangelist: have converted many Matlab users to Python and told a few that Matlab will suit their workflow better. - PyPy core contributor, and drive-by contributor to many other projects.
- "Kibbutznik".
- Believe that diversity and inclusion are important to human society, tech, and open source.

# History

<img src="history.png" width=75%>

<img src="../../presentation_resources/NumPy_info_header.svg">

History
how library got started, who are the core contributors, where are they located, how does funding work? how many are volunteers?), governance


# What will we do in the next hour?
- A brief history of NumPy: when, what, who?
- **What drives NumPy: what are the goals?**
- Communication channels and github repos
- Building and testing NumPy
- Take a look at some issues and PRs
- Q&A

In [None]:
# About us: Steering Council, Teams, Sponsors, Partners
from IPython.display import IFrame, display; display(IFrame("https://numpy.org/about/index.html", width="100%", height=400))

quick overview, don't dive too deep

In [None]:
# NumPy development is driven via its Roadmap and NEPS
from IPython.display import IFrame, display
display(IFrame("https://numpy.org/neps/", width="100%", height=400))

Roadmap -> current goals: interoperability is #1

# "Philisophy" of NumPy: a 20-year old CPU-based array object library

- Don't break anything
- A nice way to work with a chunk of memory on CPU: strides, shape, dtype
- Performance is important
- Interoperability beats performance
- Don't get too wrapped up in knowing the whole API or everything about 


# "Philosophy of NumPy": provide ways to interact with others
- Provide a minimum of tools to work with the chunk of memory, delegate to other libraries
  - In: fft, random, linalg, distutils, polynomial, f2py
  - Out: optimizations (scipy), image loading (scikit-image, Pillow), GPU (Cupy, JAX)
- Provide protocols to enable overriding `ndarray`, `__array__` `__array_wrap__`, `__array_struct__` `__array_priority__` for [subclassing](https://numpy.org/doc/stable/user/basics.subclassing.html?highlight=__array_wrap__#array-wrap-for-ufuncs-and-other-functions)
- Provide protocols to make overriding functions transparent
  - `__array_ufunc__`, `__array_function__`, `__array_module__`

From beginning, NumPy was built to be the format for Scipy: clear deliniation of responsibility

In [None]:
import numpy as np
import cupy

a = cupy.zeros([10, 10], dtype=np.int64)
a.sum()             # can use subclassing
np.sum(a)           # needs __array_ufunc__
np.mean(a)          # needs __array_function__
b = np.arange(start=0, stop=1_000, like=a)   # needs __array_module__

But even all those protocols is not enough

In [None]:
# Data API - without even importing NumPy
display(IFrame("https://data-apis.github.io/array-api/latest/", width="100%", height=400))

# What will we do in the next hour?
- A brief history of NumPy: when, what, who?
- What drives NumPy: what are the goals?
- **Communication channels and github repos**
- Building and testing NumPy
- Take a look at some issues and PRs
- Q&A

# Getting heard and getting things done
- github.com/numpy : the goto day-to-day tool
- numpy-discussion mailing list : for bigger topics


# Major repos

Open https://github.com/numpy in another tab
- code and documentation - https://github.com/numpy/numpy 
- website - https://github.com/numpy/numpy.org
  - https://numpy.org
- NEW! tutorials written in MyST-NB markdown - https://github.com/numpy/numpy-tutorials - 
- formatting docstrings - https://github.com/numpy/numpydoc - 


open issues, open PRs. How long to wait? How to be polite and yet get attention?
what are the directories in numpy/numpy? What happens to the C sources?

# What will we do in the next hour?
- A brief history of NumPy: when, what, who?
- What drives NumPy: what are the goals?
- Communication channels and github repos
- **Building and testing NumPy**
- Take a look at some issues and PRs
- Q&A

# Lets take a look at the contributor guide

https://numpy.org/devdocs/dev/index.html

The truth is, it is quite simple

# In a terminal
```
# create a directory for working
mkdir /tmp/du_dir
cd /tmp/du_dir
# create a virtual environment for conda
conda create -n data_umbrella python=3.8
conda activate data_umbrella
# regular git workflow
git clone https://github.com/numpy/numpy
cd numpy
pip install -r test_requirements.txt
python runtests.py
```

# Things to look out for

- macOSx and Accelerate
- Much of the code base is written in C (not C++)
  - generated code
  - C-API and refcounting
- Building documentation:
  - Code is self-documenting, so you must have NumPy installed
  - Docstrings for c-extensions are [injected from Python](https://github.com/numpy/numpy/blob/master/numpy/core/_add_newdocs.py) and use the [NumPyDoc](https://numpydoc.readthedocs.io/en/latest/example.html) format

# What will we do in the next hour?
- A brief history of NumPy: when, what, who?
- What drives NumPy: what are the goals?
- Communication channels and github repos
- Building and testing NumPy
- **Take a look at some issues and PRs**
- Q&A

# Typical workflow, is pretty standard

- Add a failing test (test driven development)
- Find where the function is implemented
- Start to work from there figuring out the logic that trips the test
- Fix it
- Run all the tests
- Push a PR and then interact with people

# It is not trivial to find good "first issues"

Let's look at https://github.com/numpy/numpy together

https://github.com/numpy/numpy/issues/17764 (note the formatting problems)

https://github.com/numpy/numpy/pull/17878 (wow, that got complicated quickly)

Thanks!