Let's say you wanted to use Python to work with some data. Perhaps it looks like this:

<img src="barton springs data screenshot4.png" width="700" align="left"/>

In basic Python, there are a number of different data types that can store and manipulate this data.

You can use **integers**:

In [None]:
site_id = 815540
type(site_id)

You can use **floating point** (decimal) numbers:

In [None]:
height = 3.26
type(height)

You can use **strings**:

In [None]:
site_name = "Barton Springs"
type(site_name)

In [None]:
day = "2016-02-15"
type(day)

The above data types hold one item at a time, but what if you want to represent a collection?

You can use **lists**:

In [None]:
heights = [2.45, 3.87, 9.33, 3.70]
type(heights)

You can use **tuples**, which are similar to lists but immutable (can't be modified), and more typically used with heterogenous data where the order matters.

In [None]:
row = ("2016-2-15", 8566011, 2.4, 1.2, 4.7)

And you can use **dictionaries**, which map, or associate, one piece of data with another:

In [None]:
levels_by_day = {"2016-2-15": 2.4, "2016-2-16": 3.2, "2016-2-17": 1.7 }
levels_by_day["2016-2-16"]

Here's an example of using basic Python data types to look at data stored in a file:

In [None]:
file = open('barton creek.tsv', 'r')
data_list = []
for line in file:
    data_list.append(line.split('\t'))
data_list

In [None]:
type(data_list[1])

In [None]:
data_list[1][0]

In [None]:
data_list[1][2]

The built-in Python data types are very flexible (which is one of the great strengths of Python), but that very flexibility makes them not very optimized for numerical computation. They are slow! They are also fairly limited in terms of built-in functions for manipulating complex data.

Thus, **numpy**: a scientific computing package which adds many useful features to basic Python.

One of the most key: the **ndarray**, an n-dimensional array. It is similar to a Python list, except that it is:
* homogenous (can only hold data of the same type)
* items are fixed-size (must know the size of the data it will hold)
* supports mathematical operations as matrices/vectors

Most of the time you will use the alias **array** to refer to the ndarray type.

Here, we create a numpy array:

In [None]:
import numpy as np
heights = np.array([2.45, 3.87, 9.33, 3.70])
heights

Numpy arrays can be operated on mathematically:

In [None]:
heights/2

In [None]:
heights * heights

By the way, if you need a reference in a hurry, you can always do:

In [None]:
help(heights.any)

To create a numpy array containing values in a certain range:

In [None]:
np.arange(4, 100)

To create a new array of any dimension filled with zeros:

In [None]:
np.zeros((2, 5))

We can do a speed comparison between multiplying two Python lists and multiplying two numpy arrays:

In [None]:
list1 = range(1000000)
list2 = range(1000000)
%time result = [a*b for a,b in zip(list1, list2)]

In [None]:
array1 = np.arange(1000000)
array2 = np.arange(1000000)
%time result = array1 * array2

**pandas** builds on numpy to add useful data types and operations for reading, analyzing, and visualizing tabular data.

The core pandas data type is the **DataFrame**.

To pull data from a file into a DataFrame:

In [None]:
import pandas as pd
data = pd.read_table('barton creek.tsv')
data

To retrieve a specific piece of data:

In [None]:
data.loc[11, 'mean gage height (feet)']

To get a single row:

In [None]:
data.iloc[2]

To get a single column (the pandas datatype called **Series**)

In [None]:
data['mean gage height (feet)']

To sort by a specific column:

In [None]:
data.sort_values('mean discharge (cubic feet / sec)')

To plot:

In [None]:
data['mean gage height (feet)'].plot()

You can do a LOT more, but those are the basics to help you recognize what you see in examples.
Some recommended resources to learn more:
 * pandas video tutorials (super clear series of ~10 min videos): http://www.dataschool.io/easier-data-analysis-with-pandas/
 * pandas tutorials in Jupyter Notebooks: https://pandas.pydata.org/pandas-docs/stable/tutorials.html