<!--BOOK_INFORMATION-->
<img align="left" style="padding-right:10px;" src="fig/cover-small.jpg">
<img align="left" style="padding-right:10px;" src="figures/PDSH-cover-small.png">

*This series of notebooks contains excerpts from the following resources:*

*a) the [Whirlwind Tour of Python](http://www.oreilly.com/programming/free/a-whirlwind-tour-of-python.csp) by Jake VanderPlas; the content is available [on GitHub](https://github.com/jakevdp/WhirlwindTourOfPython). The text and code are released under the [CC0](https://github.com/jakevdp/WhirlwindTourOfPython/blob/master/LICENSE) license.*

*b) the [Python Data Science Handbook](http://shop.oreilly.com/product/0636920034919.do) by Jake VanderPlas; the content is available [on GitHub](https://github.com/jakevdp/PythonDataScienceHandbook). The text is released under the [CC-BY-NC-ND license](https://creativecommons.org/licenses/by-nc-nd/3.0/us/legalcode), and code is released under the [MIT license](https://opensource.org/licenses/MIT). If you find this content useful, please consider supporting the work by [buying the book](http://shop.oreilly.com/product/0636920034919.do)!*

*Additional content has been added by Emily Marasco to support ENSF 592: Programming Fundamentals for Data Engineers, Spring 2021, University of Calgary.*

<!--NAVIGATION-->
< [Introduction to Data Science](15-Intro-Data-Science.ipynb) | [Contents](Index.ipynb) | [Understanding Data Types in Python](17-Understanding-Data-Types.ipynb) >

# Introduction to NumPy

In the next set of chapters, we'll outline techniques for effectively loading, storing, and manipulating in-memory data in Python.
The topic is very broad: datasets can come from a wide range of sources and a wide range of formats, including be collections of documents, collections of images, collections of sound clips, collections of numerical measurements, or nearly anything else.
Despite this apparent heterogeneity, it will help us to think of all data fundamentally as arrays of numbers.

For example, images–particularly digital images–can be thought of as simply two-dimensional arrays of numbers representing pixel brightness across the area.
Sound clips can be thought of as one-dimensional arrays of intensity versus time.
Text can be converted in various ways into numerical representations, perhaps binary digits representing the frequency of certain words or pairs of words.
No matter what the data are, the first step in making it analyzable will be to transform them into arrays of numbers.

For this reason, efficient storage and manipulation of numerical arrays is absolutely fundamental to the process of doing data science.
We'll now take a look at the specialized tools that Python has for handling such numerical arrays: the NumPy package, and the Pandas package.

We'll begin by learning about NumPy. NumPy (short for *Numerical Python*) provides an efficient interface to store and operate on dense data buffers.
In some ways, NumPy arrays are like Python's built-in ``list`` type, but NumPy arrays provide much more efficient storage and data operations as the arrays grow larger in size.
NumPy arrays form the core of nearly the entire ecosystem of data science tools in Python, so time spent learning to use NumPy effectively will be valuable no matter what aspect of data science interests you.

If you've followed the ENSF 592 installation instructions, you already have NumPy installed and ready to go.
If you need to install it manually, you can go to http://www.numpy.org/ and follow the installation instructions found there.
Once you do, you can import NumPy and double-check the version:

In [1]:
import numpy
numpy.__version__

'1.19.1'

For the pieces of the package discussed here, I'd recommend NumPy version 1.8 or later.
By convention, you'll find that most people in the SciPy/PyData world will import NumPy using ``np`` as an alias:

In [2]:
import numpy as np

Throughout this topic, and indeed the rest of the book, you'll find that this is the way we will import and use NumPy.

EM: If you are having trouble running the numpy import or any other numpy code in these chapters, try the following steps:
1. From the Anaconda terminal, activate your ENSF 592 environment.
2. Uninstall NumPy: ``pip uninstall numpy``
3. Reinstall NumPy: ``conda install numpy``
4. Relaunch Jupyter and test! You will now likely have version 1.19.1 installed.

More detailed documentation, along with tutorials and other resources, can be found at http://www.numpy.org.

<!--NAVIGATION-->
< [Introduction to Data Science](15-Intro-Data-Science.ipynb) | [Contents](Index.ipynb) | [Understanding Data Types in Python](17-Understanding-Data-Types.ipynb) >